[ https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193100#comment-16193100 ]
Andrei Budnik commented on MESOS-7504: -------------------------------------- Containerizer launcher spawns [pre-exec-hooks|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/launch.cpp#L384] before launching given command (e.g. `sleep 1000`). For {{NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover}} test, we need to enter {{"cgroups/cpu,filesystem/linux,namespaces/pid"}} namespaces, where `filesystem/linux` and `namespaces/pid` isolators add 2 pre-exec-hooks, from logs: {code} Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/abudnik\/mesos\/build\/src\/mesos-containerizer"}' Executing pre-exec command '{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}' {code} After launching parent container, we try to launch nested container. Agent [calls|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/containerizer.cpp#L1758] [getMountNamespaceTarget|https://github.com/apache/mesos/blob/46db7e4f27831d20244a57b22a70312f2a574395/src/slave/containerizer/mesos/utils.cpp#L59] function, which returns the "Cannot get target mount namespace from process" error in this test. If you take a look at it, you'll find that there is a small delay after enumerating all child processes (which might still contain running pre-exec-hook processes) and before calling {{ns::getns}} for each child process. During this delay any of pre-exec-hook processes might exit, hence causing this error message. > Parent's mount namespace cannot be determined when launching a nested > container. > -------------------------------------------------------------------------------- > > Key: MESOS-7504 > URL: https://issues.apache.org/jira/browse/MESOS-7504 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 1.3.0 > Environment: Ubuntu 16.04 > Reporter: Alexander Rukletsov > Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > > I've observed this failure twice in different Linux environments. Here is an > example of such failure: > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover > I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: > cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image > I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend > 'overlay' > I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer > I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete > I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container > 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework > I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at > '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d' > for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d > I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus > 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d > I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep > > 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o > nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}" > --pipe_read="29" --pipe_write="32" > --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d" > --unshare_namespace_mnt="false"' > I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container > 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS > | CLONE_NEWPID > I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's > forked pid 1873 to > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid' > I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for > container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr > I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container > 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35 > I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching > 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" > --help="false" > --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b" > --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" > --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" > --wait_for_connection="true"' for container > 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35 > I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard > server (pid: 1881) listening on socket file > '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for > container > 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35 > I0509 21:53:25.522195 17191 containerizer.cpp:1524] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"command":{"shell":true,"value":"sleep > 1000"},"enter_namespaces":[131072,536870912],"environment":{}}" > --pipe_read="32" --pipe_write="33" > --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/containers/ea991d38-e1a5-44fe-a522-622b15142e35" > --unshare_namespace_mnt="false"' > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:543: > Failure > (launch).failure(): Cannot get target mount namespace from process 1873: > Cannot get 'mnt' namespace for child process '1885' > I0509 21:53:25.536957 17191 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d > I0509 21:53:25.638844 17192 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d > after 101.84192ms > I0509 21:53:25.639927 17189 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d > I0509 21:53:25.640831 17189 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d > after 872960ns > I0509 21:53:25.642843 17189 cgroups.cpp:2692] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 > I0509 21:53:25.745189 17186 cgroups.cpp:1405] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after > 102.276096ms > I0509 21:53:25.746119 17189 cgroups.cpp:2710] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 > I0509 21:53:25.747002 17189 cgroups.cpp:1434] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after > 856064ns > [ FAILED ] > NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover (325 > ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)