You should be able to do something like this (assuming an input file formatted
as: String, IntVal, LongVal)
import org.apache.spark.sql.types._
val recSchema = StructType(List(StructField(strVal, StringType, false),
StructField(intVal, IntegerType,
The best approach I've found to calculate Percentiles in Spark is to leverage
SparkSQL. If you use the Hive Query Language support, you can use the UDAFs
for percentiles (as of Spark 1.2)
Something like this (Note: syntax not guaranteed to run but should give you the
gist of what you need
When I looked at this last fall, the only way that seemed to be available was
to transform my data into SchemaRDDs, register them as tables and then use the
Hive processor to calculate them with its built in percentile UDFs that were
added in 1.2.
Curt
From:
to submit this to a spark stand-alone cluster.
Any insights would be appreciated.?
From: Thomas Demoor thomas.dem...@amplidata.com
Sent: Tuesday, January 27, 2015 4:41 AM
To: Kohler, Curt E (ELS-STL)
Cc: user@spark.apache.org
Subject: Re: Spark and S3 server side