[ 
https://issues.apache.org/jira/browse/MESOS-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-9609:
--------------------------------

    Shepherd:   (was: Benno Evers)
    Assignee:     (was: Greg Mann)

> Master check failure when marking agent unreachable
> ---------------------------------------------------
>
>                 Key: MESOS-9609
>                 URL: https://issues.apache.org/jira/browse/MESOS-9609
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Priority: Critical
>              Labels: foundations, mesosphere
>
> {code}
> Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815433    13 
> http.cpp:1185] HTTP POST for /master/api/v1/scheduler from 10.142.0.5:55133
> Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815588    13 
> master.cpp:5467] Processing DECLINE call for offers: [ 
> 5e57f633-a69c-4009-b773-990b4b8984ad-O58323 ] for framework 
> 5e57f633-a69c-4009-b7
> Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815693    13 
> master.cpp:10703] Removing offer 5e57f633-a69c-4009-b773-990b4b8984ad-O58323
> Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820142    10 
> master.cpp:8227] Marking agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49 at 
> slave(1)@10.142.0.10:5051 (tf-mesos-agent-t7c8.c.bitcoin-engi
> Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820367    10 
> registrar.cpp:495] Applied 1 operations in 86528ns; attempting to update the 
> registry
> Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820572    10 
> registrar.cpp:552] Successfully updated the registry in 175872ns
> Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820642    11 
> master.cpp:8275] Marked agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49 at 
> slave(1)@10.142.0.10:5051 (tf-mesos-agent-t7c8.c.bitcoin-engin
> Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820957     9 
> hierarchical.cpp:609] Removed agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49
> Mar 11 10:04:35 research docker[4503]: F0311 10:04:35.851961    11 
> master.cpp:10018] Check failed: 'framework' Must be non NULL
> Mar 11 10:04:35 research docker[4503]: *** Check failure stack trace: ***
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6044a7d  
> google::LogMessage::Fail()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6046830  
> google::LogMessage::SendToLog()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6044663  
> google::LogMessage::Flush()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6047259  
> google::LogMessageFatal::~LogMessageFatal()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5258e14  
> google::CheckNotNull<>()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c521dfc8  
> mesos::internal::master::Master::__removeSlave()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c521f1a2  
> mesos::internal::master::Master::_markUnreachable()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5f98f11  
> process::ProcessBase::consume()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5fb2a4a  
> process::ProcessManager::resume()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5fb65d6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c35d4c80  (unknown)
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2de76ba  start_thread
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2b1d41d  (unknown)
> Mar 11 10:04:36 research docker[4503]: *** Aborted at 1520762676 (unix time) 
> try "date -d @1520762676" if you are using GNU date ***
> Mar 11 10:04:36 research docker[4503]: PC: @     0x7f96c2a4d196 (unknown)
> Mar 11 10:04:36 research docker[4503]: *** SIGSEGV (@0x0) received by PID 1 
> (TID 0x7f96b986d700) from PID 0; stack trace: ***
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2df1390 (unknown)
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2a4d196 (unknown)
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c604ce2c 
> google::DumpStackTraceAndExit()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6044a7d 
> google::LogMessage::Fail()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6046830 
> google::LogMessage::SendToLog()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6044663 
> google::LogMessage::Flush()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c6047259 
> google::LogMessageFatal::~LogMessageFatal()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5258e14 
> google::CheckNotNull<>()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c521dfc8 
> mesos::internal::master::Master::__removeSlave()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c521f1a2 
> mesos::internal::master::Master::_markUnreachable()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5f98f11 
> process::ProcessBase::consume()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5fb2a4a 
> process::ProcessManager::resume()
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c5fb65d6 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c35d4c80 (unknown)
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2de76ba start_thread
> Mar 11 10:04:36 research docker[4503]: @     0x7f96c2b1d41d (unknown)
> Mar 11 10:04:38 research systemd[1]: mesos-master2.service: main process 
> exited, code=exited, status=139/n/a
> Mar 11 10:04:38 research docker[18886]: mesos-master
> Mar 11 10:04:38 research systemd[1]: Unit mesos-master2.service entered 
> failed state.
> {code}
> Additional case:
> {noformat}
>  I0715 02:56:40.071446    13 master.cpp:1295] Agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150) disconnected
>  I0715 02:56:40.071503    13 master.cpp:3333] Disconnecting agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150)
>  I0715 02:56:40.071527    13 master.cpp:3352] Deactivating agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150)
>  I0715 02:56:40.071563    13 master.cpp:1319] Removing framework 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-0000 (toil) from disconnected agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150) because the framework is not checkpointing
>  I0715 02:56:40.071579    13 master.cpp:11006] Removing framework 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-0000 (toil) from agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150)
>  I0715 02:56:40.071583    12 hierarchical.cpp:829] Agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 deactivated
>  I0715 02:56:40.071619    13 master.cpp:11766] Removing executor 'toil-41' 
> with resources {} of framework 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-0000 on 
> agent 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 at slave(1)@10.0.138.150:5051 
> (10.0.138.150)
>  I0715 02:58:08.642220    12 master.cpp:9130] Marking agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 (10.0.138.150) unreachable: health 
> check timed out
>  I0715 02:58:08.642675    11 registrar.cpp:487] Applied 1 operations in 
> 305592ns; attempting to update the registry
>  I0715 02:58:08.642922    13 registrar.cpp:544] Successfully updated the 
> registry in 187904ns
>  I0715 02:58:08.643081    17 master.cpp:9173] Marked agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0 (10.0.138.150) unreachable: health 
> check timed out
>  F0715 02:58:08.643210    17 master.cpp:11402] Check failed: 'framework' Must 
> be non NULL
>  *** Check failure stack trace: ***
>  I0715 02:58:08.643254    12 hierarchical.cpp:680] Removed agent 
> 9d8dd16c-13f4-4f15-bac8-e5138b2862ee-S0
>      @     0x7ffbcffd090d  google::LogMessage::Fail()
>      @     0x7ffbcffd2748  google::LogMessage::SendToLog()
>      @     0x7ffbcffd04f3  google::LogMessage::Flush()
>      @     0x7ffbcffd31d9  google::LogMessageFatal::~LogMessageFatal()
>      @     0x7ffbcec65024  google::CheckNotNull<>()
>      @     0x7ffbcec32658  mesos::internal::master::Master::__removeSlave()
>      @     0x7ffbcec33b13  mesos::internal::master::Master::_markUnreachable()
>      @     0x7ffbcec33e55  
> _ZNO6lambda12CallableOnceIFN7process6FutureIbEEvEE10CallableFnINS_8internal7PartialIZN5mesos8internal6master6Master15markUnreachableERKNS9_9SlaveInfoEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEUlbE_JbEEEEclEv
>      @     0x7ffbce93d5d8  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureIbEEEclINS0_IFSC_vEEEEESC_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseIbEESt14default_deleteISO_EEOSG_S3_E_ISR_SG_St12_PlaceholderILi1EEEEEEclEOS3_
>      @     0x7ffbcff18371  process::ProcessBase::consume()
>      @     0x7ffbcff3a97a  process::ProcessManager::resume()
>      @     0x7ffbcff3e6a6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
>      @     0x7ffbcc2cd9e0  (unknown)
>      @     0x7ffbcbde06db  start_thread
>      @     0x7ffbcbb0988f  (unknown)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to