Re: Blockmgr directories intermittently not being cleaned up

2018-06-20 Thread tBoyle
I'm experiencing the same behaviour with shuffle data being orphaned on disk (Spark 2.0.1 with Spark streaming). We are using AWS R4 EC2 instances with 300GB EBS volumes attached, most spilled shuffle data is eventually cleaned up by the ContextCleaner within 10 minutes. We do not use the

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Jeff Frylings
The logs are not the problem; it is the shuffle files that are not being cleaned up. We do have the configs for log rolling and that is working just fine. ex: /mnt/blockmgr-d65d4a74-d59a-4a06-af93-ba29232f7c5b/31/shuffle_1_46_0.data > On May 30, 2018, at 9:54 AM, Ajay wrote: > > I have used

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Ajay
I have used these configs in the paths to clean up the executor logs. .set("spark.executor.logs.rolling.time.interval", "minutely") .set("spark.executor.logs.rolling.strategy", "time") .set("spark.executor.logs.rolling.maxRetainedFiles", "1") On Wed, May 30, 2018 at 8:49 AM

Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Jeff Frylings
Intermittently on spark executors we are seeing blockmgr directories not being cleaned up after execution and is filling up disk. These executors are using Mesos dynamic resource allocation and no single app using an executor seems to be the culprit. Sometimes an app will run and be cleaned