I believe the root of the Docker filesystem for any given container goes in
/var/lib/docker/something/container_id/... on the host filesystem.

The gc executor cleans out the sandbox directory but anything written
anywhere else will stick around on the host filesystem until docker rm is
called. This is handled by Mesos after DOCKER_REMOVE_DELAY = 6h later. [1]

[1]
https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/src/slave/constants.cpp#L52

I think the takeaway here is to use /mnt/mesos/sandbox as your scratch
space, since that's all that Thermos watches. Mesos is not merciful if you
anger it.

Hussein Elgridly
Senior Software Engineer, DSDE
The Broad Institute of MIT and Harvard


On 9 April 2015 at 12:25, Bill Farner <[email protected]> wrote:

> I'm fairly ignorant to some of the practicalities here - if you don't write
> to /mnt/mesos/sandbox, where do files land?  Some other ephemeral directory
> that dies with the container?
>
> -=Bill
>
> On Thu, Apr 9, 2015 at 7:11 AM, Hussein Elgridly <
> [email protected]
> > wrote:
>
> > Thanks, that's helpful. I've also just discovered that Thermos only
> > monitors disk usage in the sandbox location, so if we launch a Docker job
> > and write to anywhere that's not /mnt/mesos/sandbox, we can exceed our
> disk
> > quota. I can work around this by turning our scratch space directories
> into
> > symlinks located under the sandbox, though.
> >
> > Hussein Elgridly
> > Senior Software Engineer, DSDE
> > The Broad Institute of MIT and Harvard
> >
> >
> > On 8 April 2015 at 19:43, Zameer Manji <[email protected]> wrote:
> >
> > > Hey,
> > >
> > > The deletion of sandbox directories is done by the Mesos slave not the
> GC
> > > executor. You will have to ask Mesos devs on the relationship between
> low
> > > disk and sandbox deletion.
> > >
> > > The executor enforces disk usage by running `du` in the background
> > > periodically. I suspect in your case your process fails before the
> > executor
> > > notices the disk usage has been exceeded and marks the task as failed.
> > This
> > > explains why the disk usage message is not there.
> > >
> > > I'm not sure why the finalizers are not running, but you should note
> that
> > > they are best effort by the executor. The executor won't be able to run
> > > them if Mesos tears down the container from underneath it for example.
> > >
> > > On Mon, Apr 6, 2015 at 10:30 AM, Hussein Elgridly <
> > > [email protected]> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I've just had my first task fail due to exceeding disk capacity, and
> > I've
> > > > run into some strange behaviour.
> > > >
> > > > It's a Java process that's running inside a Docker container
> specified
> > in
> > > > the task config. The Java process is failing with
> java.io.IOException:
> > No
> > > > space left on device when attempting to write a file.
> > > >
> > > > Three things are (or aren't) then happening which I think are just
> > plain
> > > > wrong:
> > > >
> > > > 1. The task is being marked as failed (good!) but isn't reporting
> that
> > it
> > > > exceeded disk limits (bad). I was expecting to see the "Disk limit
> > > > exceeded.  Reserved X bytes vs used Y bytes." message, but neither
> the
> > > > Mesos nor Aurora web interfaces are telling me this.
> > > > 2. The task's sandbox directory is being nuked. All of it,
> immediately.
> > > > There while the job is running, vanished as soon as it fails (I
> > happened
> > > to
> > > > be watching it live). This makes debugging difficult, and the
> > > > Aurora/Thermos web UI clearly has trouble because it reports the
> > resource
> > > > requests as all zero when they most definitely weren't.
> > > > 3. Finalizers aren't running. No finalizers = no error log = no
> > > debugging =
> > > > sadface. :(
> > > >
> > > > I think what's actually happening here is that the process is running
> > out
> > > > of disk on the machine itself and that IOException is propagating up
> > from
> > > > the kernel, rather than Mesos killing the process from its disk usage
> > > > monitoring.
> > > >
> > > > As such, we're going to try configuring the Mesos slaves with
> > > > --resources='disk:some_smaller_value' to leave a little overhead in
> the
> > > > hope that the Mesos disk monitor catches the overusage before the
> > process
> > > > attempts to claim the last free block on disk.
> > > >
> > > > I don't know why it'd be nuking the sandbox, though. And is the GC
> > > executor
> > > > more aggressive about cleaning out old sandbox directories if the
> disk
> > is
> > > > low on free space?
> > > >
> > > > If it helps, we're on Aurora commit
> > > > 2bf03dc5eae89b1e40bfd47683c54c185c78a9d3.
> > > >
> > > > Thanks,
> > > >
> > > > Hussein Elgridly
> > > > Senior Software Engineer, DSDE
> > > > The Broad Institute of MIT and Harvard
> > > >
> > > > --
> > > > Zameer Manji
> > > >
> > > >
> > >
> >
>

Reply via email to