[
https://issues.apache.org/jira/browse/MESOS-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944691#comment-16944691
]
Meng Zhu commented on MESOS-10006:
----------------------------------
Debug patch landed in master and 1.9.x, 1.8.x (will be included in 1.9.1 and
1.8.2)
{noformat}
commit 3457771b42993c85e3da3c4550b233f61b14bc99 (origin/master, apache/master,
master, check_slaveID)
Author: Meng Zhu <[email protected]>
Date: Fri Oct 4 10:48:40 2019 -0400
Made `CHECK` in sorter print out more info upon failure.
Review: https://reviews.apache.org/r/71581
{noformat}
> Crash in Sorter: "Check failed: resources.contains(slaveId)"
> ------------------------------------------------------------
>
> Key: MESOS-10006
> URL: https://issues.apache.org/jira/browse/MESOS-10006
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.1.0, 1.4.1, 1.9.0
> Environment: Ubuntu Bionic 18.04, Mesos 1.1.0, 1.4.1, 1.9.0 (logs are
> from 1.9.0).
> Reporter: Terra Field
> Priority: Major
> Attachments: mesos-master.log.gz
>
>
> We've hit a similar exception on 3 different versions of the Mesos master
> (the line #/file name changes but the Check failed is the same), usually when
> under very high load:
> {noformat}
> F1003 22:06:54.463502 8579 sorter.hpp:339] Check failed:
> resources.contains(slaveId)
> {noformat}
> This particular occurrence happened after the election of a new master that
> was then stuck doing framework update broadcasts, as documented in
> MESOS-10005.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)