RE: Spark SQL driver memory keeps rising

2016-06-16 Thread Mohammed Guller
Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Khaled Hammouda [mailto:khaled.hammo...@kik.com] Sent: Thursday, June 16, 2016 11:45 AM To: Mohammed Guller Cc: user Subject: Re: Spark SQL driver memory keeps rising I'm using pyspark and running in YARN client mode. I managed to ano

Re: Spark SQL driver memory keeps rising

2016-06-16 Thread Khaled Hammouda
I'm using pyspark and running in YARN client mode. I managed to anonymize the code a bit and pasted it below. You'll notice that I don't collect any output in the driver, instead the data is written to parquet directly. Also notice that I increased spark.driver.maxResultSize to 10g because the

Re: Spark SQL driver memory keeps rising

2016-06-15 Thread Mich Talebzadeh
you will need to be more specific about how you are using these parameters. have you looked at spark WEB GUI (default port 4040) to see the jobs and stages. the amount of shuffle will also be given. also it helps if you do jps on OS and send the output of ps aux|grep ,PID> as well. What sort of

RE: Spark SQL driver memory keeps rising

2016-06-15 Thread Mohammed Guller
It would be hard to guess what could be going on without looking at the code. It looks like the driver program goes into a long stop-the-world GC pause. This should not happen on the machine running the driver program if all that you are doing is reading data from HDFS, perform a bunch of