[ https://issues.apache.org/jira/browse/MESOS-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601356#comment-14601356 ]
Michael Schenck commented on MESOS-2934: ---------------------------------------- Does this only occur when growing from 3 to 5 master, or is 5 the maximum number of masters? (as in, you cannot do 7 masters with {{--quorum=4}}) > Mesos master crashes when quorum set to 4 > ----------------------------------------- > > Key: MESOS-2934 > URL: https://issues.apache.org/jira/browse/MESOS-2934 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.22.1 > Environment: CentOS 7 > Java 1.7.0_55 > Reporter: Craig W > Priority: Minor > Labels: documentaion > > When deploying 5 mesos masters, with quorum set to 4, the masters start up > but fail to stay running. Instead they exit and then restart (Monit is used > to supervise the process) within a few seconds. This cycle continues non-stop. > The logs on the master look like this: > {noformat} > Received a recover response from a replica in EMPTY status > Received a recover response from a replica in EMPTY status > Replica in EMPTY status received a broadcasted recover request > Recovery failed: Failed to recover registrar: Failed to perform fetch within > 1mins > Replica in EMPTY status received a broadcasted recover request > Received a recover response from a replica in EMPTY status > Received a recover response from a replica in EMPTY status > Replica in EMPTY status received a broadcasted recover > The newly elected leader is master@<ip>:5050 with id > 20150625-102436-748881418-5050-2157 > Elected as the leading master! > Recovering from registrar > Recovering registrar > Unable to finish the recover protocol in 10secs, retrying > Unable to finish the recover protocol in 10secs, retrying > Recovery failed: Failed to recover registrar: Failed to perform fetch within > 1mins > {noformat} > When I change the quorum to 2 and run just 3 mesos master processes, the > cluster stays up without a hitch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)