you will need to be more specific about how you are using these parameters.

have you looked at spark WEB GUI (default port 4040) to see the jobs and
stages. the amount of shuffle will also be given.

also it helps if you do jps on OS and send the output of ps aux|grep ,PID> as
well.

What sort of resource manager are you using? Have you looked at yarn logs?

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 June 2016 at 03:23, Mohammed Guller <moham...@glassbeam.com> wrote:

> It would be hard to guess what could be going on without looking at the
> code. It looks like the driver program goes into a long stop-the-world GC
> pause. This should not happen on the machine running the driver program if
> all that you are doing is reading data from HDFS, perform a bunch of
> transformations and write result back into HDFS.
>
>
>
> Perhaps, the program is not actually using Spark in cluster mode, but
> running Spark in local mode?
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Khaled Hammouda [mailto:khaled.hammo...@kik.com]
> *Sent:* Tuesday, June 14, 2016 10:23 PM
> *To:* user
> *Subject:* Spark SQL driver memory keeps rising
>
>
>
> I'm having trouble with a Spark SQL job in which I run a series of SQL
> transformations on data loaded from HDFS.
>
>
>
> The first two stages load data from hdfs input without issues, but later
> stages that require shuffles cause the driver memory to keep rising until
> it is exhausted, and then the driver stalls, the spark UI stops responding,
> and the I can't even kill the driver with ^C, I have to forcibly kill the
> process.
>
>
>
> I think I'm allocating enough memory to the driver: driver memory is 44
> GB, and spark.driver.memoryOverhead is 4.5 GB. When I look at the memory
> usage, the driver memory before the shuffle starts is at about 2.4 GB
> (virtual mem size for the driver process is about 50 GB), and then once the
> stages that require shuffle start I can see the driver memory rising fast
> to about 47 GB, then everything stops responding.
>
>
>
> I'm not invoking any output operation that collects data at the driver. I
> just call .cache() on a couple of dataframes since they get used more than
> once in the SQL transformations, but those should be cached on the workers.
> Then I write the final result to a parquet file, but the job doesn't get to
> this final stage.
>
>
>
> What could possibly be causing the driver memory to rise that fast when no
> data is being collected at the driver?
>
>
>
> Thanks,
>
> Khaled
>

Reply via email to