The nodes are quite fast to come up, but I will try to increase that for a test anyway. Either way, shouldn't the system try again automatically instead of just issuing repeatedly "Replica in EMPTY status received a broadcasted recover request" after a couple of failures?
Thanks for the answer. On 25 November 2015 at 17:31, Marco Massenzio <ma...@mesosphere.io> wrote: > A quick glance of the logs doesn't show anything that stands out, apart > from: > > --zk_session_timeout="10secs" > > which seems to lead to: > > Nov 23 16:50:13 node1 mesos-master[17501]: I1123 16:50:13.594151 17521 > recover.cpp:111] Unable to finish the recover protocol in 10secs, > retrying > > That is the default value, but maybe your setup may need longer than that > (it is possible that the time it takes for all master nodes to come up and > reach quorum may be the issue). > > -- > *Marco Massenzio* > Distributed Systems Engineer > http://codetrips.com > > On Wed, Nov 25, 2015 at 3:06 AM, Guilherme Moro <guilherme.m...@ammeon.com > > > wrote: > > > https://issues.apache.org/jira/browse/MESOS-4010 > > > > On 24 November 2015 at 13:55, Klaus Ma <klaus1982...@gmail.com> wrote: > > > > > I'd suggest to open a JIRA to trace issue; I think you can append > > > master.log & slave.log for owner reference. > > > > > > ---- > > > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer > > > Platform Symphony/DCOS Development & Support, STG, IBM GCG > > > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me > > > > > > On Tue, Nov 24, 2015 at 8:45 PM, Guilherme Moro < > > guilherme.m...@ammeon.com > > > > > > > wrote: > > > > > > > Hi, > > > > > > > > I'm having a problem while trying to create the initial cluster, no > > > leader > > > > is elected. > > > > For a start, let me explain my setup: > > > > 3 nodes > > > > 3 zookeepers > > > > 3 mesos-master services, configured as initctl services and > controlled > > by > > > > puppet, RPM's installed are from the RHEL repository at mesosphere > > > > (installed through puppet as well), running on RHEL 6.6 > > > > Quorum is set to 2, as expected, all the remaining configs were > double > > > > checked and appears to be correct. > > > > Most of times I can get the cluster to bootstrap after rebooting the > > > nodes > > > > (sometimes more than once). > > > > The whole thing resembles a bit > > > > https://issues.apache.org/jira/browse/MESOS-2148 and > > > > https://issues.apache.org/jira/browse/MESOS-2014 > > > > > > > > Even when I get the master elected, sometimes another couple of > reboots > > > or > > > > restarts of the services are needed to get all the slave nodes added > > > (they > > > > are the same nodes as the masters). > > > > > > > > I can quite easily reproduce this behavior, if someone cares to look > at > > > > logs tell me exactly what to collect and what logging flags I should > > > > enable. > > > > > > > > So, should I maybe open a bug or is there any trick to bootstrap the > > > > cluster that I'm losing here. > > > > > > > > Regards, > > > > > > > > Guilherme Moro > > > > > > > > -- > > > > This email and any files transmitted with it are confidential and > > > intended > > > > solely for the use of the individual or entity to whom they are > > > addressed. > > > > If you have received this email in error please notify the system > > > manager. > > > > This message contains confidential information and is intended only > for > > > the > > > > individual named. If you are not the named addressee you should not > > > > disseminate, distribute or copy this e-mail. > > > > > > > > > > > > > > > -- > > This email and any files transmitted with it are confidential and > intended > > solely for the use of the individual or entity to whom they are > addressed. > > If you have received this email in error please notify the system > manager. > > This message contains confidential information and is intended only for > the > > individual named. If you are not the named addressee you should not > > disseminate, distribute or copy this e-mail. > > > > > -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.