Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Joris Billen Wed, 30 Mar 2022 08:40:46 -0700

Thanks for answer-much appreciated! This forum is very useful :-)

I didnt know the sparkcontext stays alive. I guess this is eating up memory.  
The eviction means that he knows that he should clear some of the old cached 
memory to be able to store new one. In case anyone has good articles about 
memory leaks I would be interested to read.
I will try to add following lines at the end of my job (as I cached the table 
in spark sql):



sqlContext.sql("UNCACHE TABLE mytableofinterest ")
spark.stop()


Wrt looping: if I want to process 3 years of data, my modest cluster will never 
do it one go , I would expect? I have to break it down in smaller pieces and 
run that in a loop (1 day is already lots of data).



Thanks!




On 30 Mar 2022, at 17:25, Sean Owen <sro...@gmail.com<mailto:sro...@gmail.com>> 
wrote:

The Spark context does not stop when a job does. It stops when you stop it. 
There could be many ways mem can leak. Caching maybe - but it will evict. You 
should be clearing caches when no longer needed.

I would guess it is something else your program holds on to in its logic.

Also consider not looping; there is probably a faster way to do it in one go.

On Wed, Mar 30, 2022, 10:16 AM Joris Billen 
<joris.bil...@bigindustries.be<mailto:joris.bil...@bigindustries.be>> wrote:
Hi,
I have a pyspark job submitted through spark-submit that does some heavy 
processing for 1 day of data. It runs with no errors. I have to loop over many 
days, so I run this spark job in a loop. I notice after couple executions the 
memory is increasing on all worker nodes and eventually this leads to 
faillures. My job does some caching, but I understand that when the job ends 
successfully, then the sparkcontext is destroyed and the cache should be 
cleared. However it seems that something keeps on filling the memory a bit more 
and more after each run. THis is the memory behaviour over time, which in the 
end will start leading to failures :
[cid:C5C58A91-D7ED-4522-9984-C75192E4A9AA@home]

(what we see is: green=physical memory used, green-blue=physical memory cached, 
grey=memory capacity =straight line around 31GB )
This runs on a healthy spark 2.4 and was optimized already to come to a stable 
job in terms of spark-submit resources parameters like 
driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait).
Any clue how to “really” clear the memory in between jobs? So basically 
currently I can loop 10x and then need to restart my cluster so all memory is 
cleared completely.


Thanks for any info!

<Screenshot 2022-03-30 at 15.28.24.png>

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Reply via email to