It would be very difficult to tell without knowing what is your application code doing, what kind of transformation/actions performing. From my previous experience tuning application code which avoids unnecessary objects reduce pressure on GC.
On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman <keithgchap...@gmail.com> wrote: > Hi, > > I'm benchmarking a spark application by running it for multiple > iterations, its a benchmark thats heavy on shuffle and I run it on a local > machine with a very large hear (~200GB). The system has a SSD. When running > for 3 to 4 iterations I get into a situation that I run out of disk space > on the /tmp directory. On further investigation I was able to figure out > that the reason for this is that the shuffle files are still around, > because I have a very large hear GC has not happen and hence the shuffle > files are not deleted. I was able to confirm this by lowering the heap size > and I see GC kicking in more often and the size of /tmp stays under > control. Is there any way I could configure spark to handle this issue? > > One option that I have is to have GC run more often by > setting spark.cleaner.periodicGC.interval to a much lower value. Is there > a cleaner solution? > > Regards, > Keith. > > http://keith-chapman.com >