[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516181#comment-15516181 ]
Ian Babrou commented on MESOS-6118: ----------------------------------- I also experience this issue: {noformat} Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763520 4995 slave.cpp:3211] Handling status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 from executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763664 4991 slave.cpp:6014] Terminating task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763825 5002 docker.cpp:972] Running docker -H unix:///var/run/docker.sock inspect mesos-dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2.c40cea8c-31a9-468f-a183-ed9851cd5aa8 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821267 4987 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821296 4987 status_update_manager.cpp:825] Checkpointing UPDATE for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844871 4987 status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to the agent Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844970 5009 slave.cpp:3604] Forwarding the update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to master@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845062 5009 slave.cpp:3498] Status update manager successfully handled status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845074 5009 slave.cpp:3514] Sending acknowledgement for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.864859 4987 slave.cpp:3686] Received ping from slave-observer(149)@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.955936 4995 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.956001 4995 status_update_manager.cpp:825] Checkpointing ACK for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.982950 4995 status_update_manager.cpp:528] Cleaning up status update stream for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983119 4995 slave.cpp:2597] Status update manager successfully handled status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983131 4995 slave.cpp:6055] Completing task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667191 4981 process.cpp:3323] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/state' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667413 4983 http.cpp:270] HTTP GET for /slave(1)/state from 10.10.19.24:33570 with User-Agent='Go-http-client/1.1' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.669677 5012 process.cpp:3323] Handling HTTP event for process 'files' with path: '/files/download' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.670250 5005 process.cpp:1280] Sending file at '/state/var/lib/mesos/slaves/dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2/frameworks/20150606-001827-252388362-5050-5982-0001/executors/pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1/runs/c40cea8c-31a9-468f-a183-ed9851cd5aa8/stdout' with length 1335 Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.765249 5008 slave.cpp:3732] executor(1)@10.10.23.25:46833 exited Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.783426 4982 process.cpp:3323] Handling HTTP event for process 'files' with path: '/files/download' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.783843 4995 process.cpp:1280] Sending file at '/state/var/lib/mesos/slaves/dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2/frameworks/20150606-001827-252388362-5050-5982-0001/executors/pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1/runs/c40cea8c-31a9-468f-a183-ed9851cd5aa8/stderr' with length 3543 Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.826164 5000 docker.cpp:2132] Executor for container 'c40cea8c-31a9-468f-a183-ed9851cd5aa8' has exited Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.826181 5000 docker.cpp:1852] Destroying container 'c40cea8c-31a9-468f-a183-ed9851cd5aa8' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.826207 5000 docker.cpp:1980] Running docker stop on container 'c40cea8c-31a9-468f-a183-ed9851cd5aa8' Sep 23 11:07:40 myhost mesos-agent[4980]: F0923 11:07:40.826529 5000 fs.cpp:140] Check failed: !visitedParents.contains(parentId) Sep 23 11:07:40 myhost mesos-agent[4980]: *** Check failure stack trace: *** Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd98953d google::LogMessage::Fail() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd98b1bd google::LogMessage::SendToLog() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd989102 google::LogMessage::Flush() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd98bba9 google::LogMessageFatal::~LogMessageFatal() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd45883d _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd4587a5 _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd45fc5a mesos::internal::fs::MountInfoTable::read() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd213346 mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd22f157 mesos::internal::slave::DockerContainerizerProcess::___destroy() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd92d094 process::ProcessManager::resume() Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dd92d3b7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dc007970 (unknown) Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50dbb260a4 start_thread Sep 23 11:07:40 myhost mesos-agent[4980]: @ 0x7f50db85b87d (unknown) Sep 23 11:07:40 myhost systemd[1]: mesos-agent.service: main process exited, code=killed, status=6/ABRT Sep 23 11:07:40 myhost systemd[1]: Unit mesos-agent.service entered failed state. {noformat} > Agent would crash with docker container tasks due to host mount table read. > --------------------------------------------------------------------------- > > Key: MESOS-6118 > URL: https://issues.apache.org/jira/browse/MESOS-6118 > Project: Mesos > Issue Type: Bug > Components: slave > Affects Versions: 1.0.1 > Environment: Build: 2016-08-26 23:06:27 by centos > Version: 1.0.1 > Git tag: 1.0.1 > Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > systemd version `219` detected > Inializing systemd state > Created systemd slice: `/run/systemd/system/mesos_executors.slice` > Started systemd slice `mesos_executors.slice` > Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni > Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Jamie Briant > Assignee: Kevin Klues > Priority: Critical > Labels: linux, slave > Fix For: 1.1.0, 1.0.2 > > Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, > cycle6.log, slave-crash.log > > > I have a framework which schedules thousands of short running (a few seconds > to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the > slave process will crash every few minutes (with systemd restarting it). > Crash is: > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678 1232 > fs.cpp:140] Check failed: !visitedParents.contains(parentId) > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: > *** > Version 1.0.0 works without this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)