Re: Spark SQL driver memory keeps rising

2016-06-16 Thread Khaled Hammouda
to a long stop-the-world GC > pause. This should not happen on the machine running the driver program if > all that you are doing is reading data from HDFS, perform a bunch of > transformations and write result back into HDFS. > > > > Perhaps, the program is not actually using Spa

Spark SQL driver memory keeps rising

2016-06-14 Thread Khaled Hammouda
I'm having trouble with a Spark SQL job in which I run a series of SQL transformations on data loaded from HDFS. The first two stages load data from hdfs input without issues, but later stages that require shuffles cause the driver memory to keep rising until it is exhausted, and then the driver

Re: Is there a limit on the number of tasks in one job?

2016-06-14 Thread Khaled Hammouda
ich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 13 June 2016 at 20:45, Khaled Hammouda <khaled.hammo...@kik.com > <mailto:khaled.hammo...@kik.com>> wrote: > Hi Michael, > > Thanks for the suggestion to use Spark 2.0 preview. I just downloa

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Khaled Hammouda
Hi Michael, Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the preview and tried using it, but I’m running into the exact same issue. Khaled > On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote: > > You might try with the Spark 2.0 preview. We

Re: Are Spark Streaming RDDs always processed in order?

2015-07-06 Thread Khaled Hammouda
Great! That's what I gathered from the thread titled Serial batching with Spark Streaming, but thanks for confirming this again. On 6 July 2015 at 15:31, Tathagata Das t...@databricks.com wrote: Yes, RDD of batch t+1 will be processed only after RDD of batch t has been processed. Unless there