[jira] [Commented] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

Priyanka Gupta (JIRA) Fri, 29 Apr 2016 11:43:41 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264513#comment-15264513
 ]


Priyanka Gupta commented on MESOS-5193:
---------------------------------------

Hi [~jieyu] 

I tried changing the work dir as told by you but with no luck.  Attaching the 
logs again. 
Test scenario: Node1 - leading master. Rebooted node1 -> node 2 became master. 
All is fine. Once node 1 is back, I rebooted node2 (current leading master) , 
node3 becomes master and exits, then node1 tries to becomes fails and then node 
2 also fails.

> Recovery failed: Failed to recover registrar on reboot of mesos master
> ----------------------------------------------------------------------
>
>                 Key: MESOS-5193
>                 URL: https://issues.apache.org/jira/browse/MESOS-5193
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.22.0, 0.27.0
>            Reporter: Priyanka Gupta
>              Labels: master, mesosphere
>         Attachments: node1.log, node1_after_work_dir.log, node2.log, 
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all, 
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on 
> all of them. We are using chronos on top of it. The problem is when we reboot 
> the mesos master leader, the other nodes try to get elected as leader but 
> fail with recovery registrar issue. 
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within 
> 1mins"
> The next node then try to become the leader but again fails with same error. 
> I am not sure about the issue. We are currently using mesos 0.22 and also 
> tried to upgrade to mesos 0.27 as well but the problem continues to happen. 
>  /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir 
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5193) Recovery failed: Failed to recover registrar on reboot of mesos master

Reply via email to