[ https://issues.apache.org/jira/browse/MESOS-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907592#comment-15907592 ]
Vinod Kone commented on MESOS-7234: ----------------------------------- [~gkleiman] and I looked into this and our hypothesis is that inside `containerizer::reap()` the lambda that reads the container status file is racing with init/nanny process that writes to it. [~klueska] and [~jieyu] is this a known issue? If yes, is there a fix planned? One solution might be for `Launcher::fork()` to return both the container pid and the (optional) init pid and making the containerizer do a reap on the init pid instead of the container pid when init pid is some. > NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested test is flaky > -------------------------------------------------------------------- > > Key: MESOS-7234 > URL: https://issues.apache.org/jira/browse/MESOS-7234 > Project: Mesos > Issue Type: Bug > Components: containerization, technical debt, test > Environment: Fedora 24 > Reporter: Gastón Kleiman > Labels: mesos > > {noformat} > [==========] Running 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from NestedMesosContainerizerTest > [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested > I0313 09:39:40.803444 1701 containerizer.cpp:221] Using isolation: > cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image > I0313 09:39:40.811974 1701 linux_launcher.cpp:150] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > I0313 09:39:40.812714 1701 provisioner.cpp:249] Using default backend > 'overlay' > I0313 09:39:40.827086 1739 containerizer.cpp:608] Recovering containerizer > I0313 09:39:40.829702 1737 provisioner.cpp:410] Provisioner recovery complete > I0313 09:39:40.830343 1744 containerizer.cpp:1001] Starting container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 for executor 'executor' of framework > I0313 09:39:40.834182 1738 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus > 1) for container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:40.836853 1735 linux_launcher.cpp:429] Launching container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 and cloning with namespaces CLONE_NEWNS > | CLONE_NEWPID > I0313 09:39:40.861481 1734 containerizer.cpp:1598] Checkpointing container's > forked pid 1856 to > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_Sf4Iv8/meta/slaves/frameworks/executors/execut > or/runs/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/pids/forked.pid' > I0313 09:39:40.867455 1733 containerizer.cpp:1766] Starting nested container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e > I0313 09:39:40.870790 1742 linux_launcher.cpp:429] Launching nested > container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e and > cloning with namespaces CLONE_NEWNS | CLONE_NEW > PID > I0313 09:39:45.173310 1737 containerizer.cpp:2483] Container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e has > exited > I0313 09:39:45.173354 1737 containerizer.cpp:2077] Destroying container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e in > RUNNING state > I0313 09:39:45.173630 1731 linux_launcher.cpp:505] Asked to destroy > container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e > I0313 09:39:45.174485 1731 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661 > b77fe7e > I0313 09:39:45.177196 1744 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a66 > 1b77fe7e > I0313 09:39:45.179316 1736 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd > -a1b1-6a661b77fe7e after 2.063104ms > I0313 09:39:45.181565 1733 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661 > b77fe7e > I0313 09:39:45.183686 1746 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ec > d-a1b1-6a661b77fe7e after 2.074112ms > I0313 09:39:45.187661 1738 containerizer.cpp:2356] Checkpointing termination > state to nested container's runtime directory > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_kUGOVz/containers/2 > 323db69-fc07-4d2d-bad4-3c7ddce65cd8/containers/de4bf594-02fe-4ecd-a1b1-6a661b77fe7e/termination' > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: > Failure > Expecting WIFEXITED(wait.get()->status()) but > WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) > is Killed > I0313 09:39:45.188985 1737 containerizer.cpp:2077] Destroying container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 in RUNNING state > I0313 09:39:45.189237 1734 linux_launcher.cpp:505] Asked to destroy > container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:45.189946 1734 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:40.861481 1734 containerizer.cpp:1598] Checkpointing container's > forked pid 1856 to > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_Sf4Iv8/meta/slaves/frameworks/execut[725/1914]or/runs/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/pids/forked.pid' > I0313 09:39:40.867455 1733 containerizer.cpp:1766] Starting nested container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e > I0313 09:39:40.870790 1742 linux_launcher.cpp:429] Launching nested > container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e and > cloning with namespaces CLONE_NEWNS | CLONE_NEW > PID > I0313 09:39:45.173310 1737 containerizer.cpp:2483] Container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e has > exited > I0313 09:39:45.173354 1737 containerizer.cpp:2077] Destroying container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e in > RUNNING state > I0313 09:39:45.173630 1731 linux_launcher.cpp:505] Asked to destroy > container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e > I0313 09:39:45.174485 1731 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661 > b77fe7e > I0313 09:39:45.177196 1744 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a66 > 1b77fe7e > I0313 09:39:45.179316 1736 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd > -a1b1-6a661b77fe7e after 2.063104ms > I0313 09:39:45.181565 1733 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661 > b77fe7e > I0313 09:39:45.183686 1746 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ec > d-a1b1-6a661b77fe7e after 2.074112ms > I0313 09:39:45.187661 1738 containerizer.cpp:2356] Checkpointing termination > state to nested container's runtime directory > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_kUGOVz/containers/2 > 323db69-fc07-4d2d-bad4-3c7ddce65cd8/containers/de4bf594-02fe-4ecd-a1b1-6a661b77fe7e/termination' > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: > Failure > Expecting WIFEXITED(wait.get()->status()) but > WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) > is Killed > I0313 09:39:45.188985 1737 containerizer.cpp:2077] Destroying container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 in RUNNING state > I0313 09:39:45.189237 1734 linux_launcher.cpp:505] Asked to destroy > container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:45.189946 1734 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:45.191498 1744 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos > I0313 09:39:45.191536 1734 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:45.297771 1735 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > after 106.187776ms > I0313 09:39:45.297827 1745 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos > after 106.283008ms > I0313 09:39:45.300673 1746 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos > I0313 09:39:45.301230 1733 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > I0313 09:39:45.303532 1742 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos > after 2.814976ms > I0313 09:39:45.304054 1741 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8 > after 2.78272ms > I0313 09:39:45.374223 1733 containerizer.cpp:2483] Container > 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 has exited > I0313 09:39:45.398809 1731 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b > I0313 09:39:45.400909 1742 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b after > 2.054144ms > I0313 09:39:45.403110 1739 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b > I0313 09:39:45.405185 1733 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b after > 2.034176ms > [ FAILED ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (4682 ms) > [----------] 1 test from NestedMesosContainerizerTest (4683 ms total) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)