Re: making dataframe for different types using spark-csv

2015-07-02 Thread Kohler, Curt E (ELS-STL)
You should be able to do something like this (assuming an input file formatted as: String, IntVal, LongVal) import org.apache.spark.sql.types._ val recSchema = StructType(List(StructField(strVal, StringType, false), StructField(intVal, IntegerType,

Re: Percentile example

2015-02-17 Thread Kohler, Curt E (ELS-STL)
The best approach I've found to calculate Percentiles in Spark is to leverage SparkSQL. If you use the Hive Query Language support, you can use the UDAFs for percentiles (as of Spark 1.2) Something like this (Note: syntax not guaranteed to run but should give you the gist of what you need

Re: Percentile Calculation

2015-01-28 Thread Kohler, Curt E (ELS-STL)
When I looked at this last fall, the only way that seemed to be available was to transform my data into SchemaRDDs, register them as tables and then use the Hive processor to calculate them with its built in percentile UDFs that were added in 1.2. Curt From:

Re: Spark and S3 server side encryption

2015-01-28 Thread Kohler, Curt E (ELS-STL)
to submit this to a spark stand-alone cluster. Any insights would be appreciated.? From: Thomas Demoor thomas.dem...@amplidata.com Sent: Tuesday, January 27, 2015 4:41 AM To: Kohler, Curt E (ELS-STL) Cc: user@spark.apache.org Subject: Re: Spark and S3 server side