[ https://issues.apache.org/jira/browse/MESOS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Mazumdar updated MESOS-7381: ---------------------------------- Priority: Blocker (was: Critical) Description: There are about 12-13 tests in that class which are very flaky in my environment (Debian/jessie running 4.4.38 kernel). It seems like the primary cause is that root container "failed" and caused `reap` to be called on itself, which cascalade and cause both containers to be actively killed by containerizer instead of terminate by themselves. This happens on master branch at commit 6c1e20c0f2777d9bb831be3ff43c885b253af7bb Some log: {panel} [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend 'overlay' I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container 7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at '/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d' for container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 1) for container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}" --pipe_read="7" --pipe_write="8" --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d" --unshare_namespace_mnt="false"' I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container 7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's forked pid 16164 to '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid' I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit 42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}" --pipe_read="7" --pipe_write="8" --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c" --unshare_namespace_mnt="false"' I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for container: 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, directory: /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container 7cd8794a-4c4f-43e2-8824-2459dc57753d has exited I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container 7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in RUNNING state I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has exited I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c after 1.32096ms I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c after 1.184768ms I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination state to nested container's runtime directory '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination' ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: Failure Expecting WIFEXITED(wait.get()->status()) but WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) is Killed I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d after 103.539968ms I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos after 104.015872ms I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d after 1.254912ms I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos after 102.735872ms I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 1.382144ms I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 1.203968ms [ FAILED ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms) {panel} was: There are about 12-13 tests in that class which are very flaky in my environment (Debian/jessie running 4.4.38 kernel). It seems like the primary cause is that root container "failed" and caused `reap` to be called on itself, which cascalade and cause both containers to be actively killed by containerizer instead of terminate by themselves. This happens on master branch at commit 6c1e20c0f2777d9bb831be3ff43c885b253af7bb Some log: {panel} [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend 'overlay' I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container 7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at '/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d' for container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 1) for container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}" --pipe_read="7" --pipe_write="8" --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d" --unshare_namespace_mnt="false"' I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container 7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's forked pid 16164 to '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid' I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit 42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}" --pipe_read="7" --pipe_write="8" --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c" --unshare_namespace_mnt="false"' I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for container: 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, directory: /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container 7cd8794a-4c4f-43e2-8824-2459dc57753d has exited I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container 7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in RUNNING state I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has exited I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c after 1.32096ms I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c after 1.184768ms I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination state to nested container's runtime directory '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination' ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: Failure Expecting WIFEXITED(wait.get()->status()) but WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) is Killed I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d after 103.539968ms I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos after 104.015872ms I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d after 1.254912ms I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos after 102.735872ms I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 1.382144ms I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 1.203968ms [ FAILED ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms) {panel} > Flaky tests in NestedMesosContainerizerTest > ------------------------------------------- > > Key: MESOS-7381 > URL: https://issues.apache.org/jira/browse/MESOS-7381 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 1.1.1, 1.2.0 > Reporter: Zhitao Li > Priority: Blocker > > There are about 12-13 tests in that class which are very flaky in my > environment (Debian/jessie running 4.4.38 kernel). > It seems like the primary cause is that root container "failed" and caused > `reap` to be called on itself, which cascalade and cause both containers to > be actively killed by containerizer instead of terminate by themselves. > This happens on master branch at commit > 6c1e20c0f2777d9bb831be3ff43c885b253af7bb > Some log: > {panel} > [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested > I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: > cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image > I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend > 'overlay' > I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer > I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete > I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container > 7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework > I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at > '/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d' > for container 7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus > 1) for container 7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep > > 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o > nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}" > --pipe_read="7" --pipe_write="8" > --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d" > --unshare_namespace_mnt="false"' > I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container > 7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS > | CLONE_NEWPID > I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's > forked pid 16164 to > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid' > I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for > container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb > I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit > > 42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o > nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}" > --pipe_read="7" --pipe_write="8" > --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c" > --unshare_namespace_mnt="false"' > I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested > container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and > cloning with namespaces CLONE_NEWNS | CLONE_NEWPID > I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for > container: > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, > directory: > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container > 7cd8794a-4c4f-43e2-8824-2459dc57753d has exited > I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container > 7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state > I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in > RUNNING state > I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has > exited > I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy > container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c > after 1.32096ms > I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c > after 1.184768ms > I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for > unknown container > 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c > I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination > state to nested container's runtime directory > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination' > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: > Failure > Expecting WIFEXITED(wait.get()->status()) but > WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) > is Killed > I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy > container 7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy > cgroup > mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos > I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d > after 103.539968ms > I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos > after 104.015872ms > I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos > I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d > after 1.254912ms > I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos > after 102.735872ms > I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for > unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d > I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 > I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after > 1.382144ms > I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 > I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after > 1.203968ms > [ FAILED ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms) > {panel} -- This message was sent by Atlassian JIRA (v6.3.15#6346)