----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67264/#review204566 -----------------------------------------------------------
Ship it! Ship It! src/slave/gc.cpp Lines 225-227 (patched) <https://reviews.apache.org/r/67264/#comment287165> This works, but if you wanna be consistent with the rest of how reverse iteration of the mount entries, the following could be considered: ``` foreach (const MountTable::Entry& entry, adaptor::reverse(mountTable->entries)) { ``` This is used in `src/linux/fs.cpp` and a couple of other places. - Jason Lai On June 11, 2018, 8:58 p.m., Zhitao Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67264/ > ----------------------------------------------------------- > > (Updated June 11, 2018, 8:58 p.m.) > > > Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu. > > > Bugs: MESOS-8830 > https://issues.apache.org/jira/browse/MESOS-8830 > > > Repository: mesos > > > Description > ------- > > In various corner cases, agent may not get chance to properly unmount > persistent volumes mounted inside an executor's sandbox. When GC later > gets to these sandbox directories, permanent data loss can happen (see > MESOS-8830). > > Currently, the only mounts in the host mount namespace under the sandbox > directories are persistent volumes, so this diff added protection to > unmount any dangling mount points before calling `rmdir` on the > directory. > > NOTE: this means agent will not garbage collect any path if it cannot > read its own `mountinfo` table. > > > Diffs > ----- > > src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 > src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 > src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 > src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea > src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 > src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 > src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 > src/tests/cluster.cpp 01eb0950e687227dac81b1cdb9eaba3379cf5dbb > src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 > src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 > src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 > src/tests/slave_tests.cpp 3d67511de5abd3466eeb5ad1daf318209bd69eed > > > Diff: https://reviews.apache.org/r/67264/diff/6/ > > > Testing > ------- > > Added a unit test in following patch. > > Tested with following procedures: > 1. Start a test master and agent; > 2. Created a persistent volume on agent through operator API; > 3. Use `mesos-execute` to run a task; > 4. Stop the agent; > 5. Manually bind mount persistent volume path into a `volume` directory > inside the executor sandbox (to simulate a dangling mount in MESOS-8830); > 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it > gc the path immediately. > > With this fix, we observed that the dangling mount is automatically cleaned > up, and agent produces log line: > ``` > W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume' > of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' > inside garbage collected path > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0' > ``` > > > Thanks, > > Zhitao Li > >