Re: Support virtualenv in PySpark

2016-03-01 Thread Mohannad Ali
Hello Jeff, Well this would also mean that you have to manage the same virtualenv (same path) on all nodes and install your packages to it the same way you would if you would install the packages to the default python path. In any case at the moment you can already do what you proposed by creatin

Sample sql query using pyspark

2016-03-01 Thread Maurin Lenglart
Hi, I am trying to get a sample of a sql query in to make the query run faster. My query look like this : SELECT `Category` as `Category`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM groupon_dropbox WHERE `event_date` >= '2015-11-14' AND `event_date` <= '2016-02-19' GROUP B

Re: [Help]: DataframeNAfunction fill method throwing exception

2016-03-01 Thread ai he
Hi Divya, I guess the error is thrown from spark-csv. Spark-csv tries to parse string "null" to double. The workaround is to add nullValue option, like .option("nullValue", "null"). But this nullValue feature is not included in current spark-csv 1.3. Just checkout the master of spark-csv and use

Re: Spark on Windows platform

2016-03-01 Thread Sabarish Sasidharan
If all you want is Spark standalone then its as simple as installing the binaries and calling Spark submit passing your main class. I would advise against running on Hadoop on Windows, it's a bit of trouble. But yes you can do it if you want to. Regards Sab Regards Sab On 29-Feb-2016 6:58 pm, "ga

Re: DataSet Evidence

2016-03-01 Thread Sabarish Sasidharan
BeanInfo? On 01-Mar-2016 6:25 am, "Steve Lewis" wrote: > I have a relatively complex Java object that I would like to use in a > dataset > > if I say > > Encoder evidence = Encoders.kryo(MyType.class); > > JavaRDD rddMyType= generateRDD(); // some code > > Dataset datasetMyType= sqlCtx.createDa

<    1   2