Hi TD,
Sorry for late reply,
I implemented ur suggestion, but unfortunately it didnt help me, i am still
able to see very old schuffle files, because of which ultimately my long
runnning spark job gets terminated
Below is what i did.
//This is the spark-submit job
public class
Interesting. TD, can you please throw some light on why this is and point
to the relevant code in Spark repo. It will help in a better understanding
of things that can affect a long running streaming job.
On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote:
Could you periodically
Could you periodically (say every 10 mins) run System.gc() on the driver.
The cleaning up shuffles is tied to the garbage collection.
On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com
wrote:
Hi All,
I have a 24x7 running Streaming Process, which runs on 2 hour windowed
Hi All,
I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
The issue i am facing is my worker machines are running OUT OF DISK space
I checked that the SHUFFLE FILES are not getting cleaned up.