Greg Mann created MESOS-9954:
--------------------------------
Summary: Flapping tasks with large sandboxes can fill agent disk
Key: MESOS-9954
URL: https://issues.apache.org/jira/browse/MESOS-9954
Project: Mesos
Issue Type: Bug
Reporter: Greg Mann
If a task on an agent is repeatedly re-launched after failing and pulls a large
artifact into its sandbox, it can quickly fill the agent disk. This may happen
on a time scale shorter than the disk watch interval, leading to the agent disk
filling up.
We should evaluate solutions to this issue. A couple options:
* Perhaps an aggressive (short) disk watch interval is sufficient? We should
investigate the performance impact of this approach.
* If the former doesn't work, then maybe polling free disk space whenever a
task is launched makes sense? (Rate-limiting this might be necessary)
* Perhaps we can come up with some fundamentally different approach for
detecting free disk space which would solve this issue?
--
This message was sent by Atlassian Jira
(v8.3.2#803003)