[ https://issues.apache.org/jira/browse/MESOS-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jie Yu updated MESOS-5114: -------------------------- Fix Version/s: 0.23.2 0.27.3 0.24.2 0.25.1 0.26.1 > empty quorum config causes masters to fail replica recovery and fail > -------------------------------------------------------------------- > > Key: MESOS-5114 > URL: https://issues.apache.org/jira/browse/MESOS-5114 > Project: Mesos > Issue Type: Bug > Components: stout > Affects Versions: 0.28.0 > Environment: CentOS 7.1 > Reporter: Cosmin Lehene > Fix For: 0.26.1, 0.25.1, 0.24.2, 0.28.1, 0.27.3, 0.23.2 > > > A missing default for quorum size has generated the following master config > {code} > MESOS_WORK_DIR="/var/lib/mesos/master" > MESOS_ZK="zk://zk1:2181,zk2:2181,zk3:2181/mesos" > MESOS_QUORUM= > MESOS_PORT=5050 > MESOS_CLUSTER="mesos" > MESOS_LOG_DIR="/var/log/mesos" > MESOS_LOGBUFSECS=1 > MESOS_LOGGING_LEVEL="INFO" > {code} > This was causing each elected leader to attempt replica recovery. > E.g. {{group.cpp:700] Trying to get '/mesos/log_replicas/0000000012' in > ZooKeeper}} > And eventually: > {{master.cpp:1458] Recovery failed: Failed to recover registrar: Failed to > perform fetch within 1mins}} > Full log on one of the masters > https://gist.github.com/clehene/09a9ddfe49b92a5deb4c1b421f63479e > All masters and zk nodes were reachable over the network. > Also once the quorum was configured the master recovery protocol finished > gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)