[ 
https://issues.apache.org/jira/browse/MESOS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7381:
----------------------------------
       Priority: Blocker  (was: Critical)
    Description: 

There are about 12-13 tests in that class which are very flaky in my 
environment (Debian/jessie running 4.4.38 kernel).

It seems like the primary cause is that root container "failed" and caused 
`reap` to be called on itself, which cascalade and cause both containers to be 
actively killed by containerizer instead of terminate by themselves.

This happens on master branch at commit 6c1e20c0f2777d9bb831be3ff43c885b253af7bb

Some log:

{panel}
[ RUN      ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested
I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: 
cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend 'overlay'
I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer
I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete
I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container 
7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework 
I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at 
'/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d'
 for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 1) 
for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
 
1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o 
nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}"
 --pipe_read="7" --pipe_write="8" 
--runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d"
 --unshare_namespace_mnt="false"'
I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container 
7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS | 
CLONE_NEWPID
I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's 
forked pid 16164 to 
'/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid'
I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for 
container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: 
/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb
I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit
 
42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o 
nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}"
 --pipe_read="7" --pipe_write="8" 
--runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c"
 --unshare_namespace_mnt="false"'
I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and 
cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for 
container: 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, 
directory: 
/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container 
7cd8794a-4c4f-43e2-8824-2459dc57753d has exited
I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container 
7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state
I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in 
RUNNING state
I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has 
exited
I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy 
cgroup 
mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
 after 1.32096ms
I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
 after 1.184768ms
I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for 
unknown container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination 
state to nested container's runtime directory 
'/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination'
../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: Failure
Expecting WIFEXITED(wait.get()->status()) but  
WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) is 
Killed
I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy container 
7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy 
cgroup 
mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
 after 103.539968ms
I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
 after 104.015872ms
I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
 after 1.254912ms
I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
 after 102.735872ms
I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for 
unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
1.382144ms
I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
1.203968ms
[  FAILED  ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms)
{panel}

  was:


There are about 12-13 tests in that class which are very flaky in my 
environment (Debian/jessie running 4.4.38 kernel).

It seems like the primary cause is that root container "failed" and caused 
`reap` to be called on itself, which cascalade and cause both containers to be 
actively killed by containerizer instead of terminate by themselves.

This happens on master branch at commit 6c1e20c0f2777d9bb831be3ff43c885b253af7bb

Some log:

{panel}
[ RUN      ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested
I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: 
cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend 'overlay'
I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer
I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete
I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container 
7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework 
I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at 
'/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d'
 for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 1) 
for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
 
1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o 
nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}"
 --pipe_read="7" --pipe_write="8" 
--runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d"
 --unshare_namespace_mnt="false"'
I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container 
7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS | 
CLONE_NEWPID
I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's 
forked pid 16164 to 
'/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid'
I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for 
container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: 
/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb
I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit
 
42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o 
nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}"
 --pipe_read="7" --pipe_write="8" 
--runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c"
 --unshare_namespace_mnt="false"'
I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and 
cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for 
container: 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, 
directory: 
/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container 
7cd8794a-4c4f-43e2-8824-2459dc57753d has exited
I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container 
7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state
I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in 
RUNNING state
I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has 
exited
I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy 
cgroup 
mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
 after 1.32096ms
I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
 after 1.184768ms
I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for 
unknown container 
7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination 
state to nested container's runtime directory 
'/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination'
../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: Failure
Expecting WIFEXITED(wait.get()->status()) but  
WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) is 
Killed
I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy container 
7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy 
cgroup 
mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
 after 103.539968ms
I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
 after 104.015872ms
I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
 after 1.254912ms
I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
 after 102.735872ms
I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for 
unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d
I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
1.382144ms
I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
1.203968ms
[  FAILED  ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms)
{panel}


> Flaky tests in NestedMesosContainerizerTest
> -------------------------------------------
>
>                 Key: MESOS-7381
>                 URL: https://issues.apache.org/jira/browse/MESOS-7381
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.1.1, 1.2.0
>            Reporter: Zhitao Li
>            Priority: Blocker
>
> There are about 12-13 tests in that class which are very flaky in my 
> environment (Debian/jessie running 4.4.38 kernel).
> It seems like the primary cause is that root container "failed" and caused 
> `reap` to be called on itself, which cascalade and cause both containers to 
> be actively killed by containerizer instead of terminate by themselves.
> This happens on master branch at commit 
> 6c1e20c0f2777d9bb831be3ff43c885b253af7bb
> Some log:
> {panel}
> [ RUN      ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested
> I0412 16:06:29.698456 16110 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0412 16:06:29.703445 16110 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0412 16:06:29.704658 16110 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0412 16:06:29.714778 16125 containerizer.cpp:608] Recovering containerizer
> I0412 16:06:29.718374 16125 provisioner.cpp:410] Provisioner recovery complete
> I0412 16:06:29.719195 16127 containerizer.cpp:1001] Starting container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d for executor 'executor' of framework 
> I0412 16:06:29.721225 16128 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d'
>  for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:29.723470 16128 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:29.727627 16126 containerizer.cpp:1499] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb"}"
>  --pipe_read="7" --pipe_write="8" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d"
>  --unshare_namespace_mnt="false"'
> I0412 16:06:29.728123 16131 linux_launcher.cpp:429] Launching container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0412 16:06:29.750948 16126 containerizer.cpp:1598] Checkpointing container's 
> forked pid 16164 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_SX8Ner/meta/slaves/frameworks/executors/executor/runs/7cd8794a-4c4f-43e2-8824-2459dc57753d/pids/forked.pid'
> I0412 16:06:29.755005 16130 fetcher.cpp:353] Starting to fetch URIs for 
> container: 7cd8794a-4c4f-43e2-8824-2459dc57753d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb
> I0412 16:06:29.758663 16124 containerizer.cpp:1766] Starting nested container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.762702 16126 containerizer.cpp:1499] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"exit
>  
> 42"},"enter_namespaces":[536870912],"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/uber\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb\/containers\/fc7f3523-e348-43f3-b809-3be02e35315c"}"
>  --pipe_read="7" --pipe_write="8" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c"
>  --unshare_namespace_mnt="false"'
> I0412 16:06:29.763293 16124 linux_launcher.cpp:429] Launching nested 
> container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c and 
> cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
> I0412 16:06:29.771055 16131 fetcher.cpp:353] Starting to fetch URIs for 
> container: 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c, 
> directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_AtLcOb/containers/fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.888044 16130 containerizer.cpp:2483] Container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d has exited
> I0412 16:06:29.888092 16130 containerizer.cpp:2077] Destroying container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d in RUNNING state
> I0412 16:06:29.888116 16130 containerizer.cpp:2077] Destroying container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c in 
> RUNNING state
> I0412 16:06:29.888913 16130 containerizer.cpp:2483] Container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c has 
> exited
> I0412 16:06:29.889135 16129 linux_launcher.cpp:505] Asked to destroy 
> container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.889752 16129 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.891119 16124 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.892491 16130 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
>  after 1.32096ms
> I0412 16:06:29.894062 16127 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.895290 16127 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos/fc7f3523-e348-43f3-b809-3be02e35315c
>  after 1.184768ms
> I0412 16:06:29.900456 16130 provisioner.cpp:484] Ignoring destroy request for 
> unknown container 
> 7cd8794a-4c4f-43e2-8824-2459dc57753d.fc7f3523-e348-43f3-b809-3be02e35315c
> I0412 16:06:29.900617 16128 containerizer.cpp:2356] Checkpointing termination 
> state to nested container's runtime directory 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_LaunchNested_CfIMuj/containers/7cd8794a-4c4f-43e2-8824-2459dc57753d/containers/fc7f3523-e348-43f3-b809-3be02e35315c/termination'
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:230: 
> Failure
> Expecting WIFEXITED(wait.get()->status()) but  
> WIFSIGNALED(wait.get()->status()) is true and WTERMSIG(wait.get()->status()) 
> is Killed
> I0412 16:06:29.901587 16125 linux_launcher.cpp:505] Asked to destroy 
> container 7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:29.902186 16125 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup 
> mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:29.903249 16125 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
> I0412 16:06:29.903316 16124 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:30.006909 16130 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
>  after 103.539968ms
> I0412 16:06:30.007313 16124 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
>  after 104.015872ms
> I0412 16:06:30.008755 16128 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
> I0412 16:06:30.011144 16125 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:30.012442 16125 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d
>  after 1.254912ms
> I0412 16:06:30.111548 16131 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8/7cd8794a-4c4f-43e2-8824-2459dc57753d/mesos
>  after 102.735872ms
> I0412 16:06:30.118083 16126 provisioner.cpp:484] Ignoring destroy request for 
> unknown container 7cd8794a-4c4f-43e2-8824-2459dc57753d
> I0412 16:06:30.142882 16130 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
> I0412 16:06:30.144379 16126 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
> 1.382144ms
> I0412 16:06:30.145800 16131 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8
> I0412 16:06:30.147094 16131 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_723562d9-904b-4108-aaab-973184996ab8 after 
> 1.203968ms
> [  FAILED  ] NestedMesosContainerizerTest.ROOT_CGROUPS_LaunchNested (500 ms)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to