Re: Worker Machine running out of disk for Long running Streaming process

2015-09-15 Thread gaurav sharma
Hi TD, Sorry for late reply, I implemented ur suggestion, but unfortunately it didnt help me, i am still able to see very old schuffle files, because of which ultimately my long runnning spark job gets terminated Below is what i did. //This is the spark-submit job public class

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote: Could you periodically

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-21 Thread Tathagata Das
Could you periodically (say every 10 mins) run System.gc() on the driver. The cleaning up shuffles is tied to the garbage collection. On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com wrote: Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed

Worker Machine running out of disk for Long running Streaming process

2015-08-21 Thread gaurav sharma
Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed data The issue i am facing is my worker machines are running OUT OF DISK space I checked that the SHUFFLE FILES are not getting cleaned up.