[jira] [Commented] (MESOS-7966) check for maintenance on agent causes fatal error

Greg Mann (JIRA) Wed, 11 Jul 2018 14:35:55 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540703#comment-16540703
 ]


Greg Mann commented on MESOS-7966:
----------------------------------

[~kaysoky] ping :)

There is some interest in the community in having this backported, would be 
great to make that happen. Let me know if you don't have cycles and would like 
some assistance!

> check for maintenance on agent causes fatal error
> -------------------------------------------------
>
>                 Key: MESOS-7966
>                 URL: https://issues.apache.org/jira/browse/MESOS-7966
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.1.3, 1.2.3, 1.3.2, 1.4.1, 1.5.0, 1.6.0
>            Reporter: Rob Johnson
>            Assignee: Benno Evers
>            Priority: Critical
>              Labels: mesosphere, reliability
>             Fix For: 1.7.0
>
>
> We interact with the maintenance API frequently to orchestrate gracefully 
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with 
> the api. This happens relatively frequently, and impacts us when downstream 
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: 
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're 
> happy to provide any other logs you need - please let me know what would be 
> useful for debugging.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MESOS-7966) check for maintenance on agent causes fatal error

Reply via email to