[ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193100#comment-16193100
 ] 

Andrei Budnik commented on MESOS-7504:
--------------------------------------

Containerizer launcher spawns 
[pre-exec-hooks|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/launch.cpp#L384]
 before launching given command (e.g. `sleep 1000`).
For 
{{NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover}} 
test, we need to enter {{"cgroups/cpu,filesystem/linux,namespaces/pid"}} 
namespaces, where `filesystem/linux` and `namespaces/pid` isolators add 2 
pre-exec-hooks, from logs:
{code}
Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/abudnik\/mesos\/build\/src\/mesos-containerizer"}'
Executing pre-exec command '{"shell":true,"value":"mount -n -t proc proc \/proc 
-o nosuid,noexec,nodev"}'
{code}
After launching parent container, we try to launch nested container. Agent 
[calls|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/containerizer.cpp#L1758]
  
[getMountNamespaceTarget|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/utils.cpp#L59]
 function, which returns the "Cannot get target mount namespace from process" 
error in this test.
If you take a look at it, you'll find that there is a small delay after 
enumerating all child processes (which might still contain running 
pre-exec-hook processes) and before calling {{ns::getns}} for each child 
process. During this delay any of pre-exec-hook processes might exit, hence 
causing this error message.

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-7504
>                 URL: https://issues.apache.org/jira/browse/MESOS-7504
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.3.0
>         Environment: Ubuntu 16.04
>            Reporter: Alexander Rukletsov
>            Assignee: Andrei Budnik
>              Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 1881) listening on socket file 
> '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for 
> container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.522195 17191 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"sleep 
> 1000"},"enter_namespaces":[131072,536870912],"environment":{}}" 
> --pipe_read="32" --pipe_write="33" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/containers/ea991d38-e1a5-44fe-a522-622b15142e35"
>  --unshare_namespace_mnt="false"'
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:543: 
> Failure
> (launch).failure(): Cannot get target mount namespace from process 1873: 
> Cannot get 'mnt' namespace for child process '1885'
> I0509 21:53:25.536957 17191 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.638844 17192 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
>  after 101.84192ms
> I0509 21:53:25.639927 17189 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.640831 17189 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
>  after 872960ns
> I0509 21:53:25.642843 17189 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753
> I0509 21:53:25.745189 17186 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after 
> 102.276096ms
> I0509 21:53:25.746119 17189 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753
> I0509 21:53:25.747002 17189 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after 
> 856064ns
> [  FAILED  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover (325 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to