[ 
https://issues.apache.org/jira/browse/MESOS-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907592#comment-15907592
 ] 

Vinod Kone commented on MESOS-7234:
-----------------------------------

[~gkleiman] and I looked into this and our hypothesis is that inside 
`containerizer::reap()` the lambda that reads the container status file is 
racing with init/nanny process that writes to it. [~klueska] and [~jieyu] is 
this a known issue? If yes, is there a fix planned? One solution might be for 
`Launcher::fork()` to return both the container pid and the (optional) init pid 
and making the containerizer do a reap on the init pid instead of the container 
pid when init pid is some.

> NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested test is flaky
> --------------------------------------------------------------------
>
>                 Key: MESOS-7234
>                 URL: https://issues.apache.org/jira/browse/MESOS-7234
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, technical debt, test
>         Environment: Fedora 24
>            Reporter: Gastón Kleiman
>              Labels: mesos
>
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from NestedMesosContainerizerTest
> [ RUN      ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested
> I0313 09:39:40.803444  1701 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0313 09:39:40.811974  1701 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0313 09:39:40.812714  1701 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0313 09:39:40.827086  1739 containerizer.cpp:608] Recovering containerizer
> I0313 09:39:40.829702  1737 provisioner.cpp:410] Provisioner recovery complete
> I0313 09:39:40.830343  1744 containerizer.cpp:1001] Starting container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 for executor 'executor' of framework
> I0313 09:39:40.834182  1738 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:40.836853  1735 linux_launcher.cpp:429] Launching container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0313 09:39:40.861481  1734 containerizer.cpp:1598] Checkpointing container's 
> forked pid 1856 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_Sf4Iv8/meta/slaves/frameworks/executors/execut
> or/runs/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/pids/forked.pid'
> I0313 09:39:40.867455  1733 containerizer.cpp:1766] Starting nested container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e
> I0313 09:39:40.870790  1742 linux_launcher.cpp:429] Launching nested 
> container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e and 
> cloning with namespaces CLONE_NEWNS | CLONE_NEW
> PID
> I0313 09:39:45.173310  1737 containerizer.cpp:2483] Container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e has 
> exited
> I0313 09:39:45.173354  1737 containerizer.cpp:2077] Destroying container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e in 
> RUNNING state
> I0313 09:39:45.173630  1731 linux_launcher.cpp:505] Asked to destroy 
> container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e
> I0313 09:39:45.174485  1731 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661
> b77fe7e
> I0313 09:39:45.177196  1744 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a66
> 1b77fe7e
> I0313 09:39:45.179316  1736 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd
> -a1b1-6a661b77fe7e after 2.063104ms
> I0313 09:39:45.181565  1733 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661
> b77fe7e
> I0313 09:39:45.183686  1746 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ec
> d-a1b1-6a661b77fe7e after 2.074112ms
> I0313 09:39:45.187661  1738 containerizer.cpp:2356] Checkpointing termination 
> state to nested container's runtime directory 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_kUGOVz/containers/2
> 323db69-fc07-4d2d-bad4-3c7ddce65cd8/containers/de4bf594-02fe-4ecd-a1b1-6a661b77fe7e/termination'
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: 
> Failure
> Expecting WIFEXITED(wait.get()->status()) but  
> WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) 
> is Killed
> I0313 09:39:45.188985  1737 containerizer.cpp:2077] Destroying container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 in RUNNING state
> I0313 09:39:45.189237  1734 linux_launcher.cpp:505] Asked to destroy 
> container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:45.189946  1734 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:40.861481  1734 containerizer.cpp:1598] Checkpointing container's 
> forked pid 1856 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_Sf4Iv8/meta/slaves/frameworks/execut[725/1914]or/runs/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/pids/forked.pid'
> I0313 09:39:40.867455  1733 containerizer.cpp:1766] Starting nested container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e
> I0313 09:39:40.870790  1742 linux_launcher.cpp:429] Launching nested 
> container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e and 
> cloning with namespaces CLONE_NEWNS | CLONE_NEW
> PID
> I0313 09:39:45.173310  1737 containerizer.cpp:2483] Container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e has 
> exited
> I0313 09:39:45.173354  1737 containerizer.cpp:2077] Destroying container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e in 
> RUNNING state
> I0313 09:39:45.173630  1731 linux_launcher.cpp:505] Asked to destroy 
> container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8.de4bf594-02fe-4ecd-a1b1-6a661b77fe7e
> I0313 09:39:45.174485  1731 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661
> b77fe7e
> I0313 09:39:45.177196  1744 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a66
> 1b77fe7e
> I0313 09:39:45.179316  1736 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd
> -a1b1-6a661b77fe7e after 2.063104ms
> I0313 09:39:45.181565  1733 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ecd-a1b1-6a661
> b77fe7e
> I0313 09:39:45.183686  1746 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos/de4bf594-02fe-4ec
> d-a1b1-6a661b77fe7e after 2.074112ms
> I0313 09:39:45.187661  1738 containerizer.cpp:2356] Checkpointing termination 
> state to nested container's runtime directory 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_kUGOVz/containers/2
> 323db69-fc07-4d2d-bad4-3c7ddce65cd8/containers/de4bf594-02fe-4ecd-a1b1-6a661b77fe7e/termination'
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: 
> Failure
> Expecting WIFEXITED(wait.get()->status()) but  
> WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) 
> is Killed
> I0313 09:39:45.188985  1737 containerizer.cpp:2077] Destroying container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 in RUNNING state
> I0313 09:39:45.189237  1734 linux_launcher.cpp:505] Asked to destroy 
> container 2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:45.189946  1734 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:45.191498  1744 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos
> I0313 09:39:45.191536  1734 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:45.297771  1735 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
>  after 106.187776ms
> I0313 09:39:45.297827  1745 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos
>  after 106.283008ms
> I0313 09:39:45.300673  1746 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos
> I0313 09:39:45.301230  1733 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
> I0313 09:39:45.303532  1742 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8/mesos
>  after 2.814976ms
> I0313 09:39:45.304054  1741 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b/2323db69-fc07-4d2d-bad4-3c7ddce65cd8
>  after 2.78272ms
> I0313 09:39:45.374223  1733 containerizer.cpp:2483] Container 
> 2323db69-fc07-4d2d-bad4-3c7ddce65cd8 has exited
> I0313 09:39:45.398809  1731 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b
> I0313 09:39:45.400909  1742 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b after 
> 2.054144ms
> I0313 09:39:45.403110  1739 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b
> I0313 09:39:45.405185  1733 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_03c9934d-4d27-408f-8476-ee1c1983ff1b after 
> 2.034176ms
> [  FAILED  ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (4682 ms)
> [----------] 1 test from NestedMesosContainerizerTest (4683 ms total)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to