[ https://issues.apache.org/jira/browse/MESOS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988199#comment-14988199 ]
Neil Conway edited comment on MESOS-3280 at 11/3/15 9:47 PM: ------------------------------------------------------------- Fix merged in 82b6112cabc838f9bfa, should be in 0.26 was (Author: neilc): Merged in 82b6112cabc838f9bfa. > Master fails to access replicated log after network partition > ------------------------------------------------------------- > > Key: MESOS-3280 > URL: https://issues.apache.org/jira/browse/MESOS-3280 > Project: Mesos > Issue Type: Bug > Components: master, replicated log > Affects Versions: 0.23.0 > Environment: Zookeeper version 3.4.5--1 > Reporter: Bernd Mathiske > Assignee: Neil Conway > Labels: mesosphere > Fix For: 0.26.0 > > Attachments: rep-log-race-cond-logs.tar.gz, > rep-log-startup-race-test-1.patch > > > In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a > network partition is forced, all the masters apparently lose access to their > replicated log. The leading master halts. Unknown reasons, but presumably > related to replicated log access. The others fail to recover from the > replicated log. Unknown reasons. This could have to do with ZK setup, but it > might also be a Mesos bug. > This was observed in a Chronos test drive scenario described in detail here: > https://github.com/mesos/chronos/issues/511 > With setup instructions here: > https://github.com/mesos/chronos/issues/508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)