[ 
https://issues.apache.org/jira/browse/MESOS-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-7766:
----------------------------------
    Sprint: Mesosphere Sprint 59

> Segfault when trying to accept inverse offer with unknown offerId
> -----------------------------------------------------------------
>
>                 Key: MESOS-7766
>                 URL: https://issues.apache.org/jira/browse/MESOS-7766
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.0.4, 1.1.1
>            Reporter: Benjamin Bannier
>            Assignee: Alexander Rukletsov
>              Labels: mesosphere
>
> We just saw the following in a test cluster:
> {noformat}
> W0707 06:30:10.172188  9413 master.cpp:3939] Ignoring accept of inverse offer 
> abd00119-7353-4990-9cc5-0d6bd69a91e7-O737973 since it is no longer valid
> F0707 06:30:10.172236  9413 master.cpp:3943] CHECK_SOME(slaveId): is NONE
> *** Check failure stack trace: ***
>     @     0x7f425b1521ed  google::LogMessage::Fail()
>     @     0x7f425b15401d  google::LogMessage::SendToLog()
>     @     0x7f425b151ddc  google::LogMessage::Flush()
>     @     0x7f425b154919  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f425a564ce9  _CheckFatal::~_CheckFatal()
>     @     0x7f425a76a69d  
> mesos::internal::master::Master::acceptInverseOffers()
>     @     0x7f425a6e360e  mesos::internal::master::Master::Http::scheduler()
>     @     0x7f425a737347  
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestERK6OptionISsEEZN5mesos8internal6master6Master10initializeEvEUlS7_SB_E1_E9_M_invokeERKSt9_Any_dataS7_SB_
>     @     0x7f425b0d7413  
> _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultEEEEE0_clESC_ENKUlRKNS4_IbEEE1_clESG_
>     @     0x7f425b0e1091  process::ProcessManager::resume()
>     @     0x7f425b0e1397  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7f4259770d73  (unknown)
>     @     0x7f4258f6d52c  (unknown)
>     @     0x7f4258cab1dd  (unknown)
> {noformat}
> This seems to happen for cases where we try to accept an invalid inverse 
> offer and incorrectly assume that we can always extract an agent id,
> {code}
> Option<SlaveID> slaveId;
> // Update each inverse offer in the allocator with the accept and
> // filter.
> foreach (const OfferID& offerId, accept.inverse_offer_ids()) {
>   InverseOffer* inverseOffer = getInverseOffer(offerId);
>   if (inverseOffer != nullptr) {
>     CHECK(inverseOffer->has_slave_id());
>     slaveId = inverseOffer->slave_id();
>     mesos::allocator::InverseOfferStatus status;
>     status.set_status(mesos::allocator::InverseOfferStatus::ACCEPT);
>     status.mutable_framework_id()->CopyFrom(inverseOffer->framework_id());
>     status.mutable_timestamp()->CopyFrom(protobuf::getCurrentTime());
>     allocator->updateInverseOffer(
>         inverseOffer->slave_id(),
>         inverseOffer->framework_id(),
>         UnavailableResources{
>             inverseOffer->resources(),
>             inverseOffer->unavailability()},
>         status,
>         accept.filters());
>     removeInverseOffer(inverseOffer);
>     continue;
>   }
>   // If the offer was not in our inverse offer set, then this
>   // offer is no longer valid.
>   LOG(WARNING) << "Ignoring accept of inverse offer " << offerId
>                << " since it is no longer valid";
> }
> CHECK_SOME(slaveId);
> {code}
> If {{offerId}} is invalid, {{slaveId}} will never be set to a value, causing 
> the {{CHECK_SOME}} to fail.
> I see this issue in 1.0.4 and 1.1.1; the problematic code seems to be gone in 
> 1.1.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to