[ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496526#comment-16496526 ]
Benno Evers commented on MESOS-7966: ------------------------------------ > I wasn't aware that Marathon had its own reasons for doing dynamic > reservations. Do you have any details you can share on why it does or a link > to some code? I was just basing this on the following log lines, and the fact that marathon is the only framework ever mentioned as receiving inverse offers. {noformat} I0502 15:00:57.588295 20632 master.cpp:7769] Sending 1 inverse offers to framework 487b53f1-1a44-44b5-bf9f-24790937b51a-0001 (marathon1) at scheduler-e96a9f61-720c-4c0c-9018-60224ab59031@10.65.137.102:40886 {noformat} Actually, on re-reading the allocator code, it seems that it is enough for a framework to use any resources on the host scheduled for maintenance, so the focus on reservations was probably a bit of a red herring. It shouldn't change anything about the underlying race, though. > check for maintenance on agent causes fatal error > ------------------------------------------------- > > Key: MESOS-7966 > URL: https://issues.apache.org/jira/browse/MESOS-7966 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 1.1.0 > Reporter: Rob Johnson > Assignee: Benno Evers > Priority: Critical > Labels: mesosphere, reliability > > We interact with the maintenance API frequently to orchestrate gracefully > draining agents of tasks without impacting service availability. > Occasionally we seem to trigger a fatal error in Mesos when interacting with > the api. This happens relatively frequently, and impacts us when downstream > frameworks (marathon) react badly to leader elections. > Here is the log line that we see when the master dies: > {code} > F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: > slaves[slaveId].maintenance.isSome() > {code} > It's quite possibly we're using the maintenance API in the wrong way. We're > happy to provide any other logs you need - please let me know what would be > useful for debugging. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)