I'm experiencing the same behaviour with shuffle data being orphaned on disk
(Spark 2.0.1 with Spark streaming).
We are using AWS R4 EC2 instances with 300GB EBS volumes attached, most
spilled shuffle data is eventually cleaned up by the ContextCleaner within
10 minutes. We do not use the
The logs are not the problem; it is the shuffle files that are not being
cleaned up. We do have the configs for log rolling and that is working just
fine.
ex: /mnt/blockmgr-d65d4a74-d59a-4a06-af93-ba29232f7c5b/31/shuffle_1_46_0.data
> On May 30, 2018, at 9:54 AM, Ajay wrote:
>
> I have used
I have used these configs in the paths to clean up the executor logs.
.set("spark.executor.logs.rolling.time.interval", "minutely")
.set("spark.executor.logs.rolling.strategy", "time")
.set("spark.executor.logs.rolling.maxRetainedFiles", "1")
On Wed, May 30, 2018 at 8:49 AM
Intermittently on spark executors we are seeing blockmgr directories not being
cleaned up after execution and is filling up disk. These executors are using
Mesos dynamic resource allocation and no single app using an executor seems to
be the culprit. Sometimes an app will run and be cleaned