[jira] [Updated] (MESOS-8346) Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed
[ https://issues.apache.org/jira/browse/MESOS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-8346: Shepherd: Benjamin Bannier > Resubscription of a resource provider will crash the agent if its HTTP > connection isn't closed > -- > > Key: MESOS-8346 > URL: https://issues.apache.org/jira/browse/MESOS-8346 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Jan Schlicht >Assignee: Jan Schlicht >Priority: Blocker > Labels: mesosphere > > A resource provider might resubscribe while its old HTTP connection wasn't > properly closed. In that case an agent will crashm with, e.g., the following > log: > {noformat} > I1219 13:33:51.937295 128610304 manager.cpp:570] Subscribing resource > provider > {"id":{"value":"8e71beef-796e-4bde-9257-952ed0f230a5"},"name":"test","type":"org.apache.mesos.rp.test"} > I1219 13:33:51.937443 128610304 manager.cpp:134] Terminating resource > provider 8e71beef-796e-4bde-9257-952ed0f230a5 > I1219 13:33:51.937760 128610304 manager.cpp:134] Terminating resource > provider 8e71beef-796e-4bde-9257-952ed0f230a5 > E1219 13:33:51.937851 129683456 http_connection.hpp:445] End-Of-File received > I1219 13:33:51.937865 131293184 slave.cpp:7105] Handling resource provider > message 'DISCONNECT: resource provider 8e71beef-796e-4bde-9257-952ed0f230a5' > I1219 13:33:51.937968 131293184 slave.cpp:7347] Forwarding new total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > F1219 13:33:51.938052 132366336 manager.cpp:606] Check failed: > resourceProviders.subscribed.contains(resourceProviderId) > *** Check failure stack trace: *** > E1219 13:33:51.938583 130756608 http_connection.hpp:445] End-Of-File received > I1219 13:33:51.938987 129683456 hierarchical.cpp:669] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 (172.18.8.13) updated with total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > @0x1125380ef google::LogMessageFatal::~LogMessageFatal() > @0x112534ae9 google::LogMessageFatal::~LogMessageFatal() > I1219 13:33:51.939131 129683456 hierarchical.cpp:1517] Performed allocation > for 1 agents in 61830ns > I1219 13:33:51.945793 2646795072 slave.cpp:927] Agent terminating > I1219 13:33:51.945955 129146880 master.cpp:1305] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) disconnected > I1219 13:33:51.945979 129146880 master.cpp:3364] Disconnecting agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) > I1219 13:33:51.946022 129146880 master.cpp:3383] Deactivating agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) > I1219 13:33:51.946081 131293184 hierarchical.cpp:766] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 deactivated > @0x115f2761d > mesos::internal::ResourceProviderManagerProcess::subscribe()::$_2::operator()() > @0x115f2977d > _ZN5cpp176invokeIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS2_14HttpConnectionERKNS1_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSG_DpOSH_ > @0x115f29740 > _ZN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7Nothing13invoke_expandISC_NSt3__15tupleIJSG_EEENSK_IJEEEJLm0DTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_OSN_OSO_N5cpp1416integer_sequenceImJXspT2_OSP_ > @0x115f296bb > _ZNO6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingclIJEEEDTcl13invoke_expandclL_ZNSt3__14moveIRSC_EEONSJ_16remove_referenceIT_E4typeEOSN_EdtdefpT1fEclL_ZNSK_IRNSJ_5tupleIJSG_ESQ_SR_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOSY_ > @0x115f2965d > _ZN5cpp176invokeIN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS5_14HttpConnectionERKNS4_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSK_DpOSL_ > @0x115f29631 > _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS6_14HttpConnectionERKNS5_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEvOT_DpOT0_ > @
[jira] [Updated] (MESOS-8346) Resubscription of a resource provider will crash the agent if its HTTP connection isn't closed
[ https://issues.apache.org/jira/browse/MESOS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-8346: Sprint: Mesosphere Sprint 70 Story Points: 2 > Resubscription of a resource provider will crash the agent if its HTTP > connection isn't closed > -- > > Key: MESOS-8346 > URL: https://issues.apache.org/jira/browse/MESOS-8346 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Jan Schlicht >Assignee: Jan Schlicht >Priority: Blocker > Labels: mesosphere > Fix For: 1.5.0 > > > A resource provider might resubscribe while its old HTTP connection wasn't > properly closed. In that case an agent will crashm with, e.g., the following > log: > {noformat} > I1219 13:33:51.937295 128610304 manager.cpp:570] Subscribing resource > provider > {"id":{"value":"8e71beef-796e-4bde-9257-952ed0f230a5"},"name":"test","type":"org.apache.mesos.rp.test"} > I1219 13:33:51.937443 128610304 manager.cpp:134] Terminating resource > provider 8e71beef-796e-4bde-9257-952ed0f230a5 > I1219 13:33:51.937760 128610304 manager.cpp:134] Terminating resource > provider 8e71beef-796e-4bde-9257-952ed0f230a5 > E1219 13:33:51.937851 129683456 http_connection.hpp:445] End-Of-File received > I1219 13:33:51.937865 131293184 slave.cpp:7105] Handling resource provider > message 'DISCONNECT: resource provider 8e71beef-796e-4bde-9257-952ed0f230a5' > I1219 13:33:51.937968 131293184 slave.cpp:7347] Forwarding new total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > F1219 13:33:51.938052 132366336 manager.cpp:606] Check failed: > resourceProviders.subscribed.contains(resourceProviderId) > *** Check failure stack trace: *** > E1219 13:33:51.938583 130756608 http_connection.hpp:445] End-Of-File received > I1219 13:33:51.938987 129683456 hierarchical.cpp:669] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 (172.18.8.13) updated with total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > @0x1125380ef google::LogMessageFatal::~LogMessageFatal() > @0x112534ae9 google::LogMessageFatal::~LogMessageFatal() > I1219 13:33:51.939131 129683456 hierarchical.cpp:1517] Performed allocation > for 1 agents in 61830ns > I1219 13:33:51.945793 2646795072 slave.cpp:927] Agent terminating > I1219 13:33:51.945955 129146880 master.cpp:1305] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) disconnected > I1219 13:33:51.945979 129146880 master.cpp:3364] Disconnecting agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) > I1219 13:33:51.946022 129146880 master.cpp:3383] Deactivating agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 at slave(1)@172.18.8.13:64430 > (172.18.8.13) > I1219 13:33:51.946081 131293184 hierarchical.cpp:766] Agent > 0019c3fa-28c5-43a9-88d0-709eee271c62-S0 deactivated > @0x115f2761d > mesos::internal::ResourceProviderManagerProcess::subscribe()::$_2::operator()() > @0x115f2977d > _ZN5cpp176invokeIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS2_14HttpConnectionERKNS1_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSG_DpOSH_ > @0x115f29740 > _ZN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7Nothing13invoke_expandISC_NSt3__15tupleIJSG_EEENSK_IJEEEJLm0DTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_OSN_OSO_N5cpp1416integer_sequenceImJXspT2_OSP_ > @0x115f296bb > _ZNO6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS3_14HttpConnectionERKNS2_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingclIJEEEDTcl13invoke_expandclL_ZNSt3__14moveIRSC_EEONSJ_16remove_referenceIT_E4typeEOSN_EdtdefpT1fEclL_ZNSK_IRNSJ_5tupleIJSG_ESQ_SR_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOSY_ > @0x115f2965d > _ZN5cpp176invokeIN6lambda8internal7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS5_14HttpConnectionERKNS4_17resource_provider14Call_SubscribeEE3$_2JN7process6FutureI7NothingEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSK_DpOSL_ > @0x115f29631 > _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN5mesos8internal30ResourceProviderManagerProcess9subscribeERKNS6_14HttpConnectionERKNS5_17resource_provider14Call_Subsc