[ https://issues.apache.org/jira/browse/MESOS-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gastón Kleiman updated MESOS-7766: ---------------------------------- Sprint: Mesosphere Sprint 59 > Segfault when trying to accept inverse offer with unknown offerId > ----------------------------------------------------------------- > > Key: MESOS-7766 > URL: https://issues.apache.org/jira/browse/MESOS-7766 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 1.0.4, 1.1.1 > Reporter: Benjamin Bannier > Assignee: Alexander Rukletsov > Labels: mesosphere > > We just saw the following in a test cluster: > {noformat} > W0707 06:30:10.172188 9413 master.cpp:3939] Ignoring accept of inverse offer > abd00119-7353-4990-9cc5-0d6bd69a91e7-O737973 since it is no longer valid > F0707 06:30:10.172236 9413 master.cpp:3943] CHECK_SOME(slaveId): is NONE > *** Check failure stack trace: *** > @ 0x7f425b1521ed google::LogMessage::Fail() > @ 0x7f425b15401d google::LogMessage::SendToLog() > @ 0x7f425b151ddc google::LogMessage::Flush() > @ 0x7f425b154919 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f425a564ce9 _CheckFatal::~_CheckFatal() > @ 0x7f425a76a69d > mesos::internal::master::Master::acceptInverseOffers() > @ 0x7f425a6e360e mesos::internal::master::Master::Http::scheduler() > @ 0x7f425a737347 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestERK6OptionISsEEZN5mesos8internal6master6Master10initializeEvEUlS7_SB_E1_E9_M_invokeERKSt9_Any_dataS7_SB_ > @ 0x7f425b0d7413 > _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultEEEEE0_clESC_ENKUlRKNS4_IbEEE1_clESG_ > @ 0x7f425b0e1091 process::ProcessManager::resume() > @ 0x7f425b0e1397 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f4259770d73 (unknown) > @ 0x7f4258f6d52c (unknown) > @ 0x7f4258cab1dd (unknown) > {noformat} > This seems to happen for cases where we try to accept an invalid inverse > offer and incorrectly assume that we can always extract an agent id, > {code} > Option<SlaveID> slaveId; > // Update each inverse offer in the allocator with the accept and > // filter. > foreach (const OfferID& offerId, accept.inverse_offer_ids()) { > InverseOffer* inverseOffer = getInverseOffer(offerId); > if (inverseOffer != nullptr) { > CHECK(inverseOffer->has_slave_id()); > slaveId = inverseOffer->slave_id(); > mesos::allocator::InverseOfferStatus status; > status.set_status(mesos::allocator::InverseOfferStatus::ACCEPT); > status.mutable_framework_id()->CopyFrom(inverseOffer->framework_id()); > status.mutable_timestamp()->CopyFrom(protobuf::getCurrentTime()); > allocator->updateInverseOffer( > inverseOffer->slave_id(), > inverseOffer->framework_id(), > UnavailableResources{ > inverseOffer->resources(), > inverseOffer->unavailability()}, > status, > accept.filters()); > removeInverseOffer(inverseOffer); > continue; > } > // If the offer was not in our inverse offer set, then this > // offer is no longer valid. > LOG(WARNING) << "Ignoring accept of inverse offer " << offerId > << " since it is no longer valid"; > } > CHECK_SOME(slaveId); > {code} > If {{offerId}} is invalid, {{slaveId}} will never be set to a value, causing > the {{CHECK_SOME}} to fail. > I see this issue in 1.0.4 and 1.1.1; the problematic code seems to be gone in > 1.1.2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)