[ https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264513#comment-15264513 ]
Priyanka Gupta commented on MESOS-5193: --------------------------------------- Hi [~jieyu] I tried changing the work dir as told by you but with no luck. Attaching the logs again. Test scenario: Node1 - leading master. Rebooted node1 -> node 2 became master. All is fine. Once node 1 is back, I rebooted node2 (current leading master) , node3 becomes master and exits, then node1 tries to becomes fails and then node 2 also fails. > Recovery failed: Failed to recover registrar on reboot of mesos master > ---------------------------------------------------------------------- > > Key: MESOS-5193 > URL: https://issues.apache.org/jira/browse/MESOS-5193 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.22.0, 0.27.0 > Reporter: Priyanka Gupta > Labels: master, mesosphere > Attachments: node1.log, node1_after_work_dir.log, node2.log, > node2_after_work_dir.log, node3.log, node3_after_work_dir.log > > > Hi all, > We are using a 3 node cluster with mesos master, mesos slave and zookeeper on > all of them. We are using chronos on top of it. The problem is when we reboot > the mesos master leader, the other nodes try to get elected as leader but > fail with recovery registrar issue. > "Recovery failed: Failed to recover registrar: Failed to perform fetch within > 1mins" > The next node then try to become the leader but again fails with same error. > I am not sure about the issue. We are currently using mesos 0.22 and also > tried to upgrade to mesos 0.27 as well but the problem continues to happen. > /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir > --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2 > Can you please help us resolve this issue as its a production system. > Thanks, > Priyanka -- This message was sent by Atlassian JIRA (v6.3.4#6332)