The Spark context does not stop when a job does. It stops when you stop it. There could be many ways mem can leak. Caching maybe - but it will evict. You should be clearing caches when no longer needed.
I would guess it is something else your program holds on to in its logic. Also consider not looping; there is probably a faster way to do it in one go. On Wed, Mar 30, 2022, 10:16 AM Joris Billen <joris.bil...@bigindustries.be> wrote: > Hi, > I have a pyspark job submitted through spark-submit that does some heavy > processing for 1 day of data. It runs with no errors. I have to loop over > many days, so I run this spark job in a loop. I notice after couple > executions the memory is increasing on all worker nodes and eventually this > leads to faillures. My job does some caching, but I understand that when > the job ends successfully, then the sparkcontext is destroyed and the cache > should be cleared. However it seems that something keeps on filling the > memory a bit more and more after each run. THis is the memory behaviour > over time, which in the end will start leading to failures : > > (what we see is: green=physical memory used, green-blue=physical memory > cached, grey=memory capacity =straight line around 31GB ) > This runs on a healthy spark 2.4 and was optimized already to come to a > stable job in terms of spark-submit resources parameters like > driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait). > Any clue how to “really” clear the memory in between jobs? So basically > currently I can loop 10x and then need to restart my cluster so all memory > is cleared completely. > > > Thanks for any info! > >