[ https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851814#comment-15851814 ]
Jie Yu commented on MESOS-6563: ------------------------------- [~xujyan] I thought about PERSISTENT_VOLUME as a source too, I'd like to move to that direction as well! The tricky part is that: we need a way to support updating the volumes of a container after it is started. For instance, executor is running, a new task is sent to the executor with some persistent volumes. If we only specify persistent volumes in volume.source, how the user will specify this additional volume in the TaskInfo? > Shared Filesystem Isolator does not clean up mounts > --------------------------------------------------- > > Key: MESOS-6563 > URL: https://issues.apache.org/jira/browse/MESOS-6563 > Project: Mesos > Issue Type: Bug > Components: isolation > Reporter: David Robinson > Assignee: Ilya Pronin > > While testing the agent's 'filesystem/shared' isolator we discovered that > mounts are not unmounted, agents ended up with 1000s of mounts, one for each > task that has run. > To reproduce the problem start a mesos agent w/ > --isolation="filesystem/shared" and > --default_container_info="file:///tmp/the-container-info-below.json", then > launch and kill several tasks. After the tasks are killed the mount points > should be unmounted, but they are not. > {noformat:title=container info} > { > "type": "MESOS", > "volumes": [ > { > "container_path": "/tmp", > "host_path": "tmp", > "mode": "RW" > } > ] > } > {noformat} > Mounts are supposed to be [cleaned automatically by the kernel when the > process > exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70]. > {noformat} > // We only need to implement the `prepare()` function in this > // isolator. There is nothing to recover because we do not keep any > // state and do not monitor filesystem usage or perform any action on > // cleanup. Cleanup of mounts is done automatically done by the kernel > // when the mount namespace is destroyed after the last process > // terminates. > Future<Option<ContainerLaunchInfo>> SharedFilesystemIsolatorProcess::prepare( > const ContainerID& containerId, > const ContainerConfig& containerConfig) > { > {noformat} > We found during testing that an agent would have 1000s of dangling mounts, > all of them attributed to the mesos agent: > {noformat} > root[7]server-001 ~ # tail /proc/mounts > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-4eab-bbfa-883ca43744ad/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > /dev/sda1 > /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-0000000019-0000/executors/thermos-drobinson-test-sleep2-0-5d6bc25a-6ba7-48f9-9655-85da6ff0a383/runs/d5cc4b31-7876-4bca-b1fa-b177c5d88bfc/tmp > xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0 > root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/mounts > 4950 > root[7]server-001 ~ # pgrep -f /usr/local/bin/mesos-slave > 27799 > root[7]server-001 ~ # wc -l /proc/27799/mounts > 5079 /proc/27799/mounts > root[7]server-001 ~ # grep -c 'drobinson-test-sleep2' /proc/27799/mounts > 4950 > root[7]server-001 ~ # ps auxww | grep 'drobinson-test-sleep2' -c > 5 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)