[ https://issues.apache.org/jira/browse/MESOS-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-2934: --------------------------------------- Labels: documentation (was: documentaion) > Mesos master crashes when quorum set to 4 > ----------------------------------------- > > Key: MESOS-2934 > URL: https://issues.apache.org/jira/browse/MESOS-2934 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.22.1 > Environment: CentOS 7 > Java 1.7.0_55 > Reporter: Craig W > Priority: Minor > Labels: documentation > > When deploying 5 mesos masters, with quorum set to 4, the masters start up > but fail to stay running. Instead they exit and then restart (Monit is used > to supervise the process) within a few seconds. This cycle continues non-stop. > The logs on the master look like this: > {noformat} > Received a recover response from a replica in EMPTY status > Received a recover response from a replica in EMPTY status > Replica in EMPTY status received a broadcasted recover request > Recovery failed: Failed to recover registrar: Failed to perform fetch within > 1mins > Replica in EMPTY status received a broadcasted recover request > Received a recover response from a replica in EMPTY status > Received a recover response from a replica in EMPTY status > Replica in EMPTY status received a broadcasted recover > The newly elected leader is master@<ip>:5050 with id > 20150625-102436-748881418-5050-2157 > Elected as the leading master! > Recovering from registrar > Recovering registrar > Unable to finish the recover protocol in 10secs, retrying > Unable to finish the recover protocol in 10secs, retrying > Recovery failed: Failed to recover registrar: Failed to perform fetch within > 1mins > {noformat} > When I change the quorum to 2 and run just 3 mesos master processes, the > cluster stays up without a hitch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)