[jira] [Updated] (MESOS-8346) Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed

2017-12-20 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-8346:

Shepherd: Benjamin Bannier

> Resubscription of a resource provider will crash the agent if its HTTP 
> connection isn't closed
> --
>
> Key: MESOS-8346
> URL: https://issues.apache.org/jira/browse/MESOS-8346
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>Priority: Blocker
>  Labels: mesosphere
>
> A resource provider might resubscribe while its old HTTP connection wasn't 
> properly closed. In that case an agent will crashm with, e.g., the following 
> log:
> {noformat}
> I1219 13:33:51.937295 128610304 manager.cpp:570] Subscribing resource 
> provider 
> {"id":{"value":"8e71beef-796e-4bde-9257-952ed0f230a5"},"name":"test","type":"org.apache.mesos.rp.test"}
> I1219 13:33:51.937443 128610304 manager.cpp:134] Terminating resource 
> provider 8e71beef-796e-4bde-9257-952ed0f230a5
> I1219 13:33:51.937760 128610304 manager.cpp:134] Terminating resource 
> provider 8e71beef-796e-4bde-9257-952ed0f230a5
> E1219 13:33:51.937851 129683456 http_connection.hpp:445] End-Of-File received
> I1219 13:33:51.937865 131293184 slave.cpp:7105] Handling resource provider 
> message 'DISCONNECT: resource provider 8e71beef-796e-4bde-9257-952ed0f230a5'
> I1219 13:33:51.937968 131293184 slave.cpp:7347] Forwarding new total 
> resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> F1219 13:33:51.938052 132366336 manager.cpp:606] Check failed: 
> resourceProviders.subscribed.contains(resourceProviderId) 
> *** Check failure stack trace: ***
> E1219 13:33:51.938583 130756608 http_connection.hpp:445] End-Of-File received
> I1219 13:33:51.938987 129683456 hierarchical.cpp:669] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 (172.18.8.13) updated with total 
> resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> @0x1125380ef  google::LogMessageFatal::~LogMessageFatal()
> @0x112534ae9  google::LogMessageFatal::~LogMessageFatal()
> I1219 13:33:51.939131 129683456 hierarchical.cpp:1517] Performed allocation 
> for 1 agents in 61830ns
> I1219 13:33:51.945793 2646795072 slave.cpp:927] Agent terminating
> I1219 13:33:51.945955 129146880 master.cpp:1305] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13) disconnected
> I1219 13:33:51.945979 129146880 master.cpp:3364] Disconnecting agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13)
> I1219 13:33:51.946022 129146880 master.cpp:3383] Deactivating agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13)
> I1219 13:33:51.946081 131293184 hierarchical.cpp:766] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 deactivated
> @0x115f2761d  
> mesos::internal::ResourceProviderManagerProcess::subscribe()::$_2::operator()()
> @0x115f2977d  
> _ZN5cpp176invokeIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS2_14HttpConnectionERKNS1_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSG_DpOSH_
> @0x115f29740  
> _ZN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7Nothing13invoke_expandISC_NSt3__15tupleIJSG_EEENSK_IJEEEJLm0DTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_OSN_OSO_N5cpp1416integer_sequenceImJXspT2_OSP_
> @0x115f296bb  
> _ZNO6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingclIJEEEDTcl13invoke_expandclL_ZNSt3__14moveIRSC_EEONSJ_16remove_referenceIT_E4typeEOSN_EdtdefpT1fEclL_ZNSK_IRNSJ_5tupleIJSG_ESQ_SR_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOSY_
> @0x115f2965d  
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS5_14HttpConnectionERKNS4_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSK_DpOSL_
> @0x115f29631  
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS6_14HttpConnectionERKNS5_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEvOT_DpOT0_
> @

[jira] [Updated] (MESOS-8346) Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed

2017-12-21 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-8346:

  Sprint: Mesosphere Sprint 70
Story Points: 2

> Resubscription of a resource provider will crash the agent if its HTTP 
> connection isn't closed
> --
>
> Key: MESOS-8346
> URL: https://issues.apache.org/jira/browse/MESOS-8346
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> A resource provider might resubscribe while its old HTTP connection wasn't 
> properly closed. In that case an agent will crashm with, e.g., the following 
> log:
> {noformat}
> I1219 13:33:51.937295 128610304 manager.cpp:570] Subscribing resource 
> provider 
> {"id":{"value":"8e71beef-796e-4bde-9257-952ed0f230a5"},"name":"test","type":"org.apache.mesos.rp.test"}
> I1219 13:33:51.937443 128610304 manager.cpp:134] Terminating resource 
> provider 8e71beef-796e-4bde-9257-952ed0f230a5
> I1219 13:33:51.937760 128610304 manager.cpp:134] Terminating resource 
> provider 8e71beef-796e-4bde-9257-952ed0f230a5
> E1219 13:33:51.937851 129683456 http_connection.hpp:445] End-Of-File received
> I1219 13:33:51.937865 131293184 slave.cpp:7105] Handling resource provider 
> message 'DISCONNECT: resource provider 8e71beef-796e-4bde-9257-952ed0f230a5'
> I1219 13:33:51.937968 131293184 slave.cpp:7347] Forwarding new total 
> resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> F1219 13:33:51.938052 132366336 manager.cpp:606] Check failed: 
> resourceProviders.subscribed.contains(resourceProviderId) 
> *** Check failure stack trace: ***
> E1219 13:33:51.938583 130756608 http_connection.hpp:445] End-Of-File received
> I1219 13:33:51.938987 129683456 hierarchical.cpp:669] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 (172.18.8.13) updated with total 
> resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> @0x1125380ef  google::LogMessageFatal::~LogMessageFatal()
> @0x112534ae9  google::LogMessageFatal::~LogMessageFatal()
> I1219 13:33:51.939131 129683456 hierarchical.cpp:1517] Performed allocation 
> for 1 agents in 61830ns
> I1219 13:33:51.945793 2646795072 slave.cpp:927] Agent terminating
> I1219 13:33:51.945955 129146880 master.cpp:1305] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13) disconnected
> I1219 13:33:51.945979 129146880 master.cpp:3364] Disconnecting agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13)
> I1219 13:33:51.946022 129146880 master.cpp:3383] Deactivating agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 
> (172.18.8.13)
> I1219 13:33:51.946081 131293184 hierarchical.cpp:766] Agent 
> 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 deactivated
> @0x115f2761d  
> mesos::internal::ResourceProviderManagerProcess::subscribe()::$_2::operator()()
> @0x115f2977d  
> _ZN5cpp176invokeIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS2_14HttpConnectionERKNS1_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSG_DpOSH_
> @0x115f29740  
> _ZN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7Nothing13invoke_expandISC_NSt3__15tupleIJSG_EEENSK_IJEEEJLm0DTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_OSN_OSO_N5cpp1416integer_sequenceImJXspT2_OSP_
> @0x115f296bb  
> _ZNO6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingclIJEEEDTcl13invoke_expandclL_ZNSt3__14moveIRSC_EEONSJ_16remove_referenceIT_E4typeEOSN_EdtdefpT1fEclL_ZNSK_IRNSJ_5tupleIJSG_ESQ_SR_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOSY_
> @0x115f2965d  
> _ZN5cpp176invokeIN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS5_14HttpConnectionERKNS4_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSK_DpOSL_
> @0x115f29631  
> _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS6_14HttpConnectionERKNS5_17resource_provider14Call_Subsc