[ https://issues.apache.org/jira/browse/MESOS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avinash Sridharan reassigned MESOS-5225: ---------------------------------------- Assignee: Avinash Sridharan (was: Qian Zhang) > Command executor can not start when joining a CNI network > --------------------------------------------------------- > > Key: MESOS-5225 > URL: https://issues.apache.org/jira/browse/MESOS-5225 > Project: Mesos > Issue Type: Bug > Components: isolation > Reporter: Qian Zhang > Assignee: Avinash Sridharan > > Reproduce steps: > 1. Start master > {code} > sudo ./bin/mesos-master.sh --work_dir=/tmp > {code} > > 2. Start agent > {code} > sudo ./bin/mesos-slave.sh --master=192.168.122.171:5050 > --containerizers=mesos --image_providers=docker > --isolation=filesystem/linux,docker/runtime,network/cni > --network_cni_config_dir=/opt/cni/net_configs > --network_cni_plugins_dir=/opt/cni/plugins}} > {code} > > 3. Launch a command task with mesos-execute, and it will join a CNI network > {{net1}}. > {code} > sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=library/busybox --networks=net1 --command="sleep 10" > --shell=true > I0418 08:25:35.746758 24923 scheduler.cpp:177] Version: 0.29.0 > Subscribed with ID '3c4796f0-eee7-4939-a036-7c6387c370eb-0000' > Submitted task 'test' to agent 'b74535d8-276f-4e09-ab47-53e3721ab271-S0' > Received status update TASK_FAILED for task 'test' > message: 'Executor terminated' > source: SOURCE_AGENT > reason: REASON_EXECUTOR_TERMINATED > {code} > So the task failed with the reason "executor terminated". Here is the agent > log: > {code} > I0418 08:25:35.804873 24911 slave.cpp:1514] Got assigned task test for > framework 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:35.807937 24911 slave.cpp:1633] Launching task test for framework > 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:35.812503 24911 paths.cpp:528] Trying to chown > '/tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/t > est/runs/2b29d6d6-b314-477f-b734-7771d07d41e3' to user 'root' > I0418 08:25:35.820339 24911 slave.cpp:5620] Launching executor test of > framework 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 with resources > cpus(*):0.1; mem(*):32 in work directory '/t > mp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/test/runs/2b29d6d6-b314-477f-b734-7771d07d41e3' > I0418 08:25:35.822576 24914 containerizer.cpp:698] Starting container > '2b29d6d6-b314-477f-b734-7771d07d41e3' for executor 'test' of framework > '3c4796f0-eee7-4939-a036-7c6387c370eb-00 > 00' > I0418 08:25:35.825996 24911 slave.cpp:1851] Queuing task 'test' for executor > 'test' of framework 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:35.832348 24911 provisioner.cpp:285] Provisioning image rootfs > '/tmp/mesos/provisioner/containers/2b29d6d6-b314-477f-b734-7771d07d41e3/backends/copy/rootfses/d219ec3a-ea3 > 1-45f6-b578-a62cd02392e7' for container 2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:36.061249 24913 linux_launcher.cpp:281] Cloning child process > with flags = CLONE_NEWNET | CLONE_NEWUTS | CLONE_NEWNS > I0418 08:25:36.071208 24915 cni.cpp:643] Bind mounted '/proc/24950/ns/net' to > '/run/mesos/isolators/network/cni/2b29d6d6-b314-477f-b734-7771d07d41e3/ns' > for container 2b29d6d6-b314-4 > 77f-b734-7771d07d41e3 > I0418 08:25:36.250573 24916 cni.cpp:962] Got assigned IPv4 address > '192.168.1.2/24' from CNI network 'net1' for container > 2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:36.252002 24917 cni.cpp:765] Unable to find DNS nameservers for > container 2b29d6d6-b314-477f-b734-7771d07d41e3. Using host '/etc/resolv.conf' > I0418 08:25:37.663487 24916 containerizer.cpp:1696] Executor for container > '2b29d6d6-b314-477f-b734-7771d07d41e3' has exited > I0418 08:25:37.663745 24916 containerizer.cpp:1461] Destroying container > '2b29d6d6-b314-477f-b734-7771d07d41e3' > I0418 08:25:37.670574 24915 cgroups.cpp:2676] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:37.676864 24912 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/2b29d6d6-b314-477f-b734-7771d07d41e3 after > 6.061056ms > I0418 08:25:37.680552 24913 cgroups.cpp:2694] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:37.683346 24913 cgroups.cpp:1438] Successfully thawed cgroup > /sys/fs/cgroup/freezer/mesos/2b29d6d6-b314-477f-b734-7771d07d41e3 after > 2.46016ms > I0418 08:25:37.874023 24914 cni.cpp:1121] Unmounted the network namespace > handle > '/run/mesos/isolators/network/cni/2b29d6d6-b314-477f-b734-7771d07d41e3/ns' > for container 2b29d6d6-b31 > 4-477f-b734-7771d07d41e3 > I0418 08:25:37.874194 24914 cni.cpp:1132] Removed the container directory > '/run/mesos/isolators/network/cni/2b29d6d6-b314-477f-b734-7771d07d41e3' > I0418 08:25:37.877306 24912 linux.cpp:814] Ignoring unmounting sandbox/work > directory for container 2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:37.879295 24912 provisioner.cpp:338] Destroying container rootfs > at > '/tmp/mesos/provisioner/containers/2b29d6d6-b314-477f-b734-7771d07d41e3/backends/copy/rootfses/d219ec3 > a-ea31-45f6-b578-a62cd02392e7' for container > 2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:37.970871 24914 slave.cpp:4113] Executor 'test' of framework > 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 exited with status 1 > I0418 08:25:37.975452 24914 slave.cpp:3201] Handling status update > TASK_FAILED (UUID: a5e19b2d-b234-4adc-8791-9046af4c1395) for task test of > framework 3c4796f0-eee7-4939-a036-7c6387c > 370eb-0000 from @0.0.0.0:0 > W0418 08:25:37.978974 24911 containerizer.cpp:1303] Ignoring update for > unknown container: 2b29d6d6-b314-477f-b734-7771d07d41e3 > I0418 08:25:37.980370 24917 status_update_manager.cpp:320] Received status > update TASK_FAILED (UUID: a5e19b2d-b234-4adc-8791-9046af4c1395) for task test > of framework 3c4796f0-eee7-49 > 39-a036-7c6387c370eb-0000 > I0418 08:25:37.983105 24913 slave.cpp:3599] Forwarding the update TASK_FAILED > (UUID: a5e19b2d-b234-4adc-8791-9046af4c1395) for task test of framework > 3c4796f0-eee7-4939-a036-7c6387c3 > 70eb-0000 to master@192.168.122.171:5050 > I0418 08:25:38.017352 24917 slave.cpp:2232] Asked to shut down framework > 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 by master@192.168.122.171:5050 > I0418 08:25:38.018487 24917 slave.cpp:2257] Shutting down framework > 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:38.019630 24917 slave.cpp:4217] Cleaning up executor 'test' of > framework 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:38.020967 24911 gc.cpp:55] Scheduling > '/tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/test/runs/ > 2b29d6d6-b314-477f-b734-7771d07d41e3' for gc 6.99999975983704days in the > future > I0418 08:25:38.022328 24917 slave.cpp:4305] Cleaning up framework > 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:38.022847 24915 status_update_manager.cpp:282] Closing status > update streams for framework 3c4796f0-eee7-4939-a036-7c6387c370eb-0000 > I0418 08:25:38.022459 24912 gc.cpp:55] Scheduling > '/tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/test' > for > gc 6.99999974402963days in the future > I0418 08:25:38.023483 24916 gc.cpp:55] Scheduling > '/tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000' > for gc 6.9999997358 > 2222days in the future > ... > {code} > And this is the stderr of the executor: > {code} > cat > /tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/test/runs/2b29d6d6-b314-477f-b734-7771d07d41e3/stderr > > + /home/stack/workspace/mesos/build/src/mesos-containerizer mount > --help=false --operation=make-rslave --path=/ > + grep -E /tmp/mesos/.+ /proc/self/mountinfo > + grep -v 2b29d6d6-b314-477f-b734-7771d07d41e3 > + cut -d -f5 > + xargs --no-run-if-empty umount -l > + mount -n --rbind > /tmp/mesos/provisioner/containers/2b29d6d6-b314-477f-b734-7771d07d41e3/backends/copy/rootfses/d219ec3a-ea31-45f6-b578-a62cd02392e7 > > /tmp/mesos/slaves/b74535d8-276f-4e09-ab47-53e3721ab271-S0/frameworks/3c4796f0-eee7-4939-a036-7c6387c370eb-0000/executors/test/runs/2b29d6d6-b314-477f-b734-7771d07d41e3/.rootfs > Failed to obtain the IP address for '2b29d6d6-b314-477f-b734-7771d07d41e3'; > the DNS service may not be able to resolve it: Name or service not known > {code} > So the reason why executor terminated is, the libprocess in it failed to > resolved its hostname {{2b29d6d6-b314-477f-b734-7771d07d41e3}}, see > https://github.com/apache/mesos/blob/0.28.0/3rdparty/libprocess/src/process.cpp#L929:L935 > for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)