Re: Sample sql query using pyspark

2016-03-01 Thread Maurin Lenglart
to:mau...@cuberonlabs.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Sample sql query using pyspark Maurin, I don't know the technical reason why but: try removing the 'limit

Re: Sample sql query using pyspark

2016-03-01 Thread James Barney
Maurin, I don't know the technical reason why but: try removing the 'limit 100' part of your query. I was trying to do something similar the other week and what I found is that each executor doesn't necessarily get the same 100 rows. Joins would fail or result with a bunch of nulls when keys

Sample sql query using pyspark

2016-03-01 Thread Maurin Lenglart
Hi, I am trying to get a sample of a sql query in to make the query run faster. My query look like this : SELECT `Category` as `Category`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM groupon_dropbox WHERE `event_date` >= '2015-11-14' AND `event_date` <= '2016-02-19' GROUP