Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Sean Owen Wed, 30 Mar 2022 08:25:50 -0700

The Spark context does not stop when a job does. It stops when you stop it.
There could be many ways mem can leak. Caching maybe - but it will evict.
You should be clearing caches when no longer needed.


I would guess it is something else your program holds on to in its logic.

Also consider not looping; there is probably a faster way to do it in one
go.

On Wed, Mar 30, 2022, 10:16 AM Joris Billen <joris.bil...@bigindustries.be>
wrote:

> Hi,
> I have a pyspark job submitted through spark-submit that does some heavy
> processing for 1 day of data. It runs with no errors. I have to loop over
> many days, so I run this spark job in a loop. I notice after couple
> executions the memory is increasing on all worker nodes and eventually this
> leads to faillures. My job does some caching, but I understand that when
> the job ends successfully, then the sparkcontext is destroyed and the cache
> should be cleared. However it seems that something keeps on filling the
> memory a bit more and more after each run. THis is the memory behaviour
> over time, which in the end will start leading to failures :
>
> (what we see is: green=physical memory used, green-blue=physical memory
> cached, grey=memory capacity =straight line around 31GB )
> This runs on a healthy spark 2.4 and was optimized already to come to a
> stable job in terms of spark-submit resources parameters like
> driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait).
> Any clue how to “really” clear the memory in between jobs? So basically
> currently I can loop 10x and then need to restart my cluster so all memory
> is cleared completely.
>
>
> Thanks for any info!
>
>

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Reply via email to