[ https://issues.apache.org/jira/browse/MESOS-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383515#comment-15383515 ]
aiminlei edited comment on MESOS-5859 at 7/20/16 1:29 AM: ---------------------------------------------------------- staged task: rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a mesos-slave log: [root@mesos-cluster-10 log]# cat mesos-slave.mesos-cluster-10.37.2.35.invalid-user.log.* | grep "rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a" I0719 09:17:31.280827 130624 slave.cpp:1360] Got assigned task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a for framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.292891 130624 slave.cpp:1479] Launching task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a for framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.293154 130624 paths.cpp:472] Trying to chown '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995' to user 'root' I0719 09:17:31.298758 130624 slave.cpp:5281] Launching executor rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995' I0719 09:17:31.301000 130624 slave.cpp:1697] Queuing task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' for executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.305286 130601 docker.cpp:803] Starting container 'fdbc2f1f-2240-4187-bf33-d2793dc86995' for task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' (and executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a') of framework '34b58730-aaa6-4660-8e38-1148bf738459-0000' I0719 09:17:32.140924 130632 docker.cpp:409] Checkpointing pid 86399 to '/opt/sncc/data/mesos-slave/meta/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995/pids/forked.pid' I0719 09:17:32.172355 130626 slave.cpp:2642] Got registration for executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 from executor(1)@10.37.2.35:36139 I0719 09:17:32.178768 130624 slave.cpp:1862] Sending queued task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' to executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 at executor(1)@10.37.2.35:36139 I0719 09:20:10.351904 130629 slave.cpp:1890] Asked to kill task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:20:15.369649 130628 slave.cpp:1890] Asked to kill task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 docker.log: can not find "fdbc2f1f-2240-4187-bf33-d2793dc86995" [root@mesos-cluster-10 log]# cat docker.log | grep "fdbc2f1f-2240-4187-bf33-d2793dc86995" [root@mesos-cluster-10 log]# mesos-docker-executor process stack: #0 0x00007f4cd67316d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4cd64cf9ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f4cd7b8f610 in process::ProcessManager::wait(process::UPID const&) () from /lib/libmesos-0.27.2.so #3 0x00007f4cd7b8fc97 in process::wait(process::UPID const&, Duration const&) () from /lib/libmesos-0.27.2.so #4 0x00007f4cd7b65491 in process::Latch::await(Duration const&) () from /lib/libmesos-0.27.2.so #5 0x00007f4cd733e5a7 in mesos::MesosExecutorDriver::join() () from /lib/libmesos-0.27.2.so #6 0x0000000000419562 in main () stderr: I0719 09:17:31.320283 86018 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/8aecba0e-3d6d-492d-8908-d31e2d019343-S4","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/opt\/sncc\/data\/mesos-slave\/slaves\/8aecba0e-3d6d-492d-8908-d31e2d019343-S4\/frameworks\/34b58730-aaa6-4660-8e38-1148bf738459-0000\/executors\/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a\/runs\/f86e54b1-de27-4535-ba34-435d5a7a5b29"} I0719 09:17:31.321838 86018 fetcher.cpp:379] Fetching URI 'file:///etc/docker.tar.gz' I0719 09:17:31.321863 86018 fetcher.cpp:250] Fetching directly into the sandbox directory I0719 09:17:31.321879 86018 fetcher.cpp:187] Fetching URI 'file:///etc/docker.tar.gz' I0719 09:17:31.321895 86018 fetcher.cpp:167] Copying resource with command:cp '/etc/docker.tar.gz' '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29/docker.tar.gz' I0719 09:17:31.324514 86018 fetcher.cpp:84] Extracting with command: tar -C '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29' -xf '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29/docker.tar.gz' I0719 09:17:31.328435 86018 fetcher.cpp:92] Extracted '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29/docker.tar.gz' into '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29' I0719 09:17:31.328470 86018 fetcher.cpp:456] Fetched 'file:///etc/docker.tar.gz' to '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test25.90b9eb9a-4d4e-11e6-b8f7-02427baebb1a/runs/f86e54b1-de27-4535-ba34-435d5a7a5b29/docker.tar.gz' I0719 09:17:32.169039 86294 exec.cpp:134] Version: 0.27.2 was (Author: aiminlei): staged task: rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a mesos-slave log: [root@mesos-cluster-10 log]# cat mesos-slave.mesos-cluster-10.37.2.35.invalid-user.log.* | grep "rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a" I0719 09:17:31.280827 130624 slave.cpp:1360] Got assigned task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a for framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.292891 130624 slave.cpp:1479] Launching task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a for framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.293154 130624 paths.cpp:472] Trying to chown '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995' to user 'root' I0719 09:17:31.298758 130624 slave.cpp:5281] Launching executor rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/opt/sncc/data/mesos-slave/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995' I0719 09:17:31.301000 130624 slave.cpp:1697] Queuing task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' for executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:17:31.305286 130601 docker.cpp:803] Starting container 'fdbc2f1f-2240-4187-bf33-d2793dc86995' for task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' (and executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a') of framework '34b58730-aaa6-4660-8e38-1148bf738459-0000' I0719 09:17:32.140924 130632 docker.cpp:409] Checkpointing pid 86399 to '/opt/sncc/data/mesos-slave/meta/slaves/8aecba0e-3d6d-492d-8908-d31e2d019343-S4/frameworks/34b58730-aaa6-4660-8e38-1148bf738459-0000/executors/rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a/runs/fdbc2f1f-2240-4187-bf33-d2793dc86995/pids/forked.pid' I0719 09:17:32.172355 130626 slave.cpp:2642] Got registration for executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 from executor(1)@10.37.2.35:36139 I0719 09:17:32.178768 130624 slave.cpp:1862] Sending queued task 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' to executor 'rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a' of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 at executor(1)@10.37.2.35:36139 I0719 09:20:10.351904 130629 slave.cpp:1890] Asked to kill task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 I0719 09:20:15.369649 130628 slave.cpp:1890] Asked to kill task rep-test49.90b9c489-4d4e-11e6-b8f7-02427baebb1a of framework 34b58730-aaa6-4660-8e38-1148bf738459-0000 docker.log: can not find "fdbc2f1f-2240-4187-bf33-d2793dc86995" [root@mesos-cluster-10 log]# cat docker.log | grep "fdbc2f1f-2240-4187-bf33-d2793dc86995" [root@mesos-cluster-10 log]# mesos-docker-executor process stack: #0 0x00007f4cd67316d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4cd64cf9ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f4cd7b8f610 in process::ProcessManager::wait(process::UPID const&) () from /lib/libmesos-0.27.2.so #3 0x00007f4cd7b8fc97 in process::wait(process::UPID const&, Duration const&) () from /lib/libmesos-0.27.2.so #4 0x00007f4cd7b65491 in process::Latch::await(Duration const&) () from /lib/libmesos-0.27.2.so #5 0x00007f4cd733e5a7 in mesos::MesosExecutorDriver::join() () from /lib/libmesos-0.27.2.so #6 0x0000000000419562 in main () stderr: #0 0x00007f4cd67316d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4cd64cf9ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f4cd7b8f610 in process::ProcessManager::wait(process::UPID const&) () from /lib/libmesos-0.27.2.so #3 0x00007f4cd7b8fc97 in process::wait(process::UPID const&, Duration const&) () from /lib/libmesos-0.27.2.so #4 0x00007f4cd7b65491 in process::Latch::await(Duration const&) () from /lib/libmesos-0.27.2.so #5 0x00007f4cd733e5a7 in mesos::MesosExecutorDriver::join() () from /lib/libmesos-0.27.2.so #6 0x0000000000419562 in main () > some tasks is always in staged state > ------------------------------------ > > Key: MESOS-5859 > URL: https://issues.apache.org/jira/browse/MESOS-5859 > Project: Mesos > Issue Type: Bug > Components: docker > Affects Versions: 0.27.2 > Environment: mesos 0.27.2 + marathon 0.14.0 + docker 1.10.3 > 操作系统:centos7 内核版本:3.10.0-327.22.2.el7.centos.plus.x86_64 > Reporter: aiminlei > Priority: Critical > > when i create 30*2 apps through marathon api in a mesos-slave node. most > tasks create sucess. there was two task always in staged stat. > in mesos-slave, mesos-docker-executor process is running,but docker container > was not created, docker did not receive the message of creating containter > through looking up docker.log,. -- This message was sent by Atlassian JIRA (v6.3.4#6332)