[ 
https://issues.apache.org/jira/browse/MESOS-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960461#comment-13960461
 ] 

Ian Downes commented on MESOS-1193:
-----------------------------------

I'm not 100% sure on what has happened here but this looks like the 'executor' 
process was killed between fork'ing and exec'ing, i.e., during launch. The 
containerizer then destroy'ed the container and erased the hashmap entry before 
the deferred exec ran.

I can fix exec to return a failed launch rather than CHECK but I'm very curious 
how/why the process was killed. ... any thoughts [~tknaup]?

> Check failed: promises.contains(containerId) crashes slave
> ----------------------------------------------------------
>
>                 Key: MESOS-1193
>                 URL: https://issues.apache.org/jira/browse/MESOS-1193
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.18.0
>            Reporter: Tobi Knaup
>
> This was observed with four slaves on one machine, one framework (Marathon) 
> and around 100 tasks per slave.
> I0404 17:58:58.298075  3939 mesos_containerizer.cpp:891] Executor for 
> container '6d4de71c-a491-4544-afe0-afcbfa37094a' has exited
> I0404 17:58:58.298395  3938 slave.cpp:2047] Executor 'web_467-1396634277535' 
> of framework 201404041625-3823062160-55371-22555-0000 has terminated with 
> signal Killed
> E0404 17:58:58.298475  3929 slave.cpp:2320] Failed to unmonitor container for 
> executor web_467-1396634277535 of framework 
> 201404041625-3823062160-55371-22555-0000: Not monitored
> I0404 17:58:58.299075  3938 slave.cpp:1643] Handling status update 
> TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 
> from @0.0.0.0:0
> I0404 17:58:58.299232  3932 status_update_manager.cpp:315] Received status 
> update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.299360  3932 status_update_manager.cpp:368] Forwarding status 
> update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 
> to master@144.76.223.227:5050
> I0404 17:58:58.306967  3932 status_update_manager.cpp:393] Received status 
> update acknowledgement (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.307049  3932 slave.cpp:2186] Cleaning up executor 
> 'web_467-1396634277535' of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.307122  3932 gc.cpp:56] Scheduling 
> '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535/runs/6d4de71c-a491-4544-afe0-afcbfa37094a'
>  for gc 6.99999644578667days in the future
> I0404 17:58:58.307157  3932 gc.cpp:56] Scheduling 
> '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535'
>  for gc 6.99999644553185days in the future
> F0404 17:58:58.597434  3938 mesos_containerizer.cpp:682] Check failed: 
> promises.contains(containerId)
> *** Check failure stack trace: ***
>     @     0x7f5209da6e5d  google::LogMessage::Fail()
>     @     0x7f5209da8c9d  google::LogMessage::SendToLog()
>     @     0x7f5209da6a4c  google::LogMessage::Flush()
>     @     0x7f5209da9599  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f5209ad9f88  
> mesos::internal::slave::MesosContainerizerProcess::exec()
>     @     0x7f5209af3b56  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f5209cd0bf2  process::ProcessManager::resume()
>     @     0x7f5209cd0eec  process::schedule()
>     @     0x7f5208b48f6e  start_thread
>     @     0x7f52088739cd  (unknown)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to