[ 
https://issues.apache.org/jira/browse/MESOS-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957497#comment-16957497
 ] 

Meng Zhu commented on MESOS-10014:
----------------------------------

Hmm, the following log message looks problematic:

{noformat}
I1018 09:05:14.228754 21394 hierarchical.cpp:955] Added agent 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-S0 (ip-172-16-10-17.ec2.internal) with 
cpus:2; mem:1024; disk:1024; ports:[31000-32000] (offered or allocated: {})
I1018 09:05:14.229159 21394 hierarchical.cpp:1100] Grew agent 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-S0 by disk[RAW(,,profile)]:200 (total), {  
} (used)
I1018 09:05:14.229632 21394 hierarchical.cpp:1057] Agent 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-S0 (ip-172-16-10-17.ec2.internal) updated 
with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
I1018 09:05:14.230063 21394 hierarchical.cpp:1843] Performed allocation for 1 
agents in 128843ns
I1018 09:05:14.230569 21391 master.cpp:10926] Recovered orphan operation 
71647a26-b5fe-4b97-9162-0abb2785b909 (ID: operation) on agent 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-S0 belonging to framework 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-0000 in state OPERATION_PENDING
I1018 09:05:14.230813 21391 master.cpp:10824] Adding framework 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-0000 (default) with roles {  } suppressed
I1018 09:05:14.230991 21391 master.cpp:8295] Updating framework 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-0000 (default) with roles {  } suppressed
I1018 09:05:14.231298 21390 hierarchical.cpp:1100] Grew agent 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-S0 by disk[RAW(,,profile)]:200 (total), { 
e6284079-cb6a-4a47-8f9a-ea9b84ff622a-0000: disk(allocated: 
default-role)[RAW(,,profile)]:200 } (used)
{noformat}

This happens after the master failover. In particular, there are two `Grew 
agent ...` indicating two resource providers (each with 200 disk) are added. 
And the latter one contains *used* 200 disk. This is probably the same 200 disk 
resource printed out above by [~bmahler]

I suspect this relates to orphan operations cc/[~greggomann]

> `tryUntrackFrameworkUnderRole` check failed in 
> `HierarchicalAllocatorProcess::removeFramework`.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-10014
>                 URL: https://issues.apache.org/jira/browse/MESOS-10014
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, test
>    Affects Versions: 1.10
>            Reporter: Andrei Budnik
>            Priority: Major
>              Labels: flaky-test, resource-management
>         Attachments: AgentPendingOperationAfterMasterFailover-badrun.txt
>
>
> `ContentType/OperationReconciliationTest.AgentPendingOperationAfterMasterFailover/0`
>  test failed:
> {code:java}
> F1018 09:05:14.310616 21391 hierarchical.cpp:745] Check failed: 
> tryUntrackFrameworkUnderRole(framework, role)  Framework: 
> e6284079-cb6a-4a47-8f9a-ea9b84ff622a-0000 role: default-role
> *** Check failure stack trace: ***
>     @     0x7f40fff0a1f6  google::LogMessage::Fail()
>     @     0x7f40fff0a14f  google::LogMessage::SendToLog()
>     @     0x7f40fff09a91  google::LogMessage::Flush()
>     @     0x7f40fff0d12f  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f410fd828ac  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::removeFramework()
>     @          0x186b29f  
> _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES8_EEvRKNS_3PIDIT_EEMSA_FvT0_EOT1_ENKUlOS6_PNS_11ProcessBaseEE_clESJ_SL_
>     @          0x189c273  
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS3_11FrameworkIDESA_EEvRKNS1_3PIDIT_EEMSC_FvT0_EOT1_EUlOS8_PNS1_11ProcessBaseEE_JS8_SN_EEEDTclcl7forwardISC_Efp_Espcl7forwardIT0_Efp0_EEEOSC_DpOSP_
>     @          0x18990b7  
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDESB_EEvRKNS2_3PIDIT_EEMSD_FvT0_EOT1_EUlOS9_PNS2_11ProcessBaseEE_JS9_St12_PlaceholderILi1EEEE13invoke_expandISP_St5tupleIJS9_SR_EESU_IJOSO_EEJLm0ELm1EEEEDTcl6invokecl7forwardISD_Efp_Espcl6expandcl3getIXT2_EEcl7forwardISH_Efp0_EEcl7forwardISK_Efp2_EEEEOSD_OSH_N5cpp1416integer_sequenceImJXspT2_EEEESL_
>     @          0x1896100  
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDESB_EEvRKNS2_3PIDIT_EEMSD_FvT0_EOT1_EUlOS9_PNS2_11ProcessBaseEE_IS9_St12_PlaceholderILi1EEEEclIISO_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImILm0ELm1EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOSX_
>     @          0x1895174  
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS6_11FrameworkIDESD_EEvRKNS4_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS4_11ProcessBaseEE_ISB_St12_PlaceholderILi1EEEEEISQ_EEEDTclcl7forwardISF_Efp_Espcl7forwardIT0_Efp0_EEEOSF_DpOSV_
>     @          0x1894b2b  
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS7_11FrameworkIDESE_EEvRKNS5_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS5_11ProcessBaseEE_JSC_St12_PlaceholderILi1EEEEEJSR_EEEvOSG_DpOT0_
>     @          0x18943bc  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNSA_11FrameworkIDESH_EEvRKNS1_3PIDIT_EEMSJ_FvT0_EOT1_EUlOSF_S3_E_ISF_St12_PlaceholderILi1EEEEEEclEOS3_
>     @     0x7f41016deb22  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
>     @     0x7f410169620c  process::ProcessBase::consume()
>     @     0x7f41016c0696  
> _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
>     @          0x1822baa  process::ProcessBase::serve()
>     @     0x7f4101692af1  process::ProcessManager::resume()
>     @     0x7f410168ed68  
> _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv
>     @     0x7f41016b81e2  
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x7f41016b7244  
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEclEv
>     @     0x7f41016b6088  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
>     @     0x7f40fca44590  execute_native_thread_routine
>     @     0x7f40ffa77e25  start_thread
>     @     0x7f40fa396bad  __clone
>     @              (nil)  (unknown)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to