[ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204236#comment-16204236
 ] 

Jie Yu commented on MESOS-7504:
-------------------------------

Sounds like we should make `getMountNamespaceTarget` function more robust. I 
think it didn't consider the pre-exec command which is also 2nd level (and 
short running most likely).

I think the algorithm can be: find two levels like we did right now, but ignore 
errors about failed to get mount namespace. If in the end, we cannot find one, 
return error. Otherwise, return the new mount namespace.

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-7504
>                 URL: https://issues.apache.org/jira/browse/MESOS-7504
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.3.0
>         Environment: Ubuntu 16.04
>            Reporter: Alexander Rukletsov
>            Assignee: Andrei Budnik
>              Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 1881) listening on socket file 
> '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for 
> container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.522195 17191 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"sleep 
> 1000"},"enter_namespaces":[131072,536870912],"environment":{}}" 
> --pipe_read="32" --pipe_write="33" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/containers/ea991d38-e1a5-44fe-a522-622b15142e35"
>  --unshare_namespace_mnt="false"'
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:543: 
> Failure
> (launch).failure(): Cannot get target mount namespace from process 1873: 
> Cannot get 'mnt' namespace for child process '1885'
> I0509 21:53:25.536957 17191 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.638844 17192 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
>  after 101.84192ms
> I0509 21:53:25.639927 17189 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.640831 17189 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d
>  after 872960ns
> I0509 21:53:25.642843 17189 cgroups.cpp:2692] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753
> I0509 21:53:25.745189 17186 cgroups.cpp:1405] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after 
> 102.276096ms
> I0509 21:53:25.746119 17189 cgroups.cpp:2710] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753
> I0509 21:53:25.747002 17189 cgroups.cpp:1434] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753 after 
> 856064ns
> [  FAILED  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover (325 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to