Re: Mesos 0.19 registrar upgrade
Ok, thanks Ben! In would be nice to update documentation accordingly. So, in 0.20 there might be a flag specifying total number of masters? On 23 July 2014 00:13, Benjamin Mahler benjamin.mah...@gmail.com wrote: At the current time, you need an odd number of masters as there is an assumption built into the replicated that the number of masters = 2*quorum - 1. This assumption is present when bootstrapping the log from no data. To recover from this, you need to run an odd number of masters, and set your quorum correctly. For example, 3 masters with quorum 2, or 5 masters with quorum 3. It is safe to wipe the replica logs before doing this. There are some outstanding tickets to clean this up: https://issues.apache.org/jira/browse/MESOS-1465 https://issues.apache.org/jira/browse/MESOS-1546 We'd like to have the configuration be explicit about the total number of masters, so that the assumption need not be made. On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton barton.to...@gmail.com wrote: Hi, what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've tried to read all documentation before doing actual upgrade, but I still don't understand a few things. What should be the quorum size? The --help says that It is imperative to set this value to be a majority of masters i.e., quorum (number of masters)/2 I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right? The recover.cpp says that: we allow a replica in EMPTY status to become VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in EMPTY status So, with quorum = 3 I would need 5 Mesos masters (that's just not clear from the mesos-master --help). quorum=1, mesos-masters=1 quorum=2, mesos-masters=3 quorum=3, mesos-masters=5 quorum=4, mesos-masters=7 Is is possible to have non-even number of Mesos masters? or is it just a bad idea? With 4 masters I got into a situation when: master 1: I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status received a broadcasted recover request master 2: I0722 11:36:37.593647 7754 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request master 3: I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response from a replica in STARTING status master 4: I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response from a replica in STARTING status I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response from a replica in VOTING status I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response from a replica in EMPTY status And the election algorithm ends up in an endless loop. How can I recover from this? Delete all replica logs from master disk? Start with quorum=1 and increment number of masters? Thanks, Tomas
Mesos 0.19 registrar upgrade
Hi, what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've tried to read all documentation before doing actual upgrade, but I still don't understand a few things. What should be the quorum size? The --help says that It is imperative to set this value to be a majority of masters i.e., quorum (number of masters)/2 I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right? The recover.cpp says that: we allow a replica in EMPTY status to become VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in EMPTY status So, with quorum = 3 I would need 5 Mesos masters (that's just not clear from the mesos-master --help). quorum=1, mesos-masters=1 quorum=2, mesos-masters=3 quorum=3, mesos-masters=5 quorum=4, mesos-masters=7 Is is possible to have non-even number of Mesos masters? or is it just a bad idea? With 4 masters I got into a situation when: master 1: I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status received a broadcasted recover request master 2: I0722 11:36:37.593647 7754 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request master 3: I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response from a replica in STARTING status master 4: I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response from a replica in STARTING status I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response from a replica in VOTING status I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response from a replica in EMPTY status And the election algorithm ends up in an endless loop. How can I recover from this? Delete all replica logs from master disk? Start with quorum=1 and increment number of masters? Thanks, Tomas
Re: Mesos 0.19 registrar upgrade
On 22 July 2014 10:40, Tomas Barton barton.to...@gmail.com wrote: I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right? Yes, that's right. 2 won't be enough. quorum=1, mesos-masters=1 quorum=2, mesos-masters=3 quorum=3, mesos-masters=5 quorum=4, mesos-masters=7 Is is possible to have non-even number of Mesos masters? or is it just a bad idea? Yes, it's a bad idea since this change - it's always been a bad idea to run an even number of zookeepers and now that extends to the mesos masters. 4 masters gives you no extra redundancy over 3, and your likelihood of node loss increases slightly (as you now have an extra server to potentially break).
Re: Mesos 0.19 registrar upgrade
At the current time, you need an odd number of masters as there is an assumption built into the replicated that the number of masters = 2*quorum - 1. This assumption is present when bootstrapping the log from no data. To recover from this, you need to run an odd number of masters, and set your quorum correctly. For example, 3 masters with quorum 2, or 5 masters with quorum 3. It is safe to wipe the replica logs before doing this. There are some outstanding tickets to clean this up: https://issues.apache.org/jira/browse/MESOS-1465 https://issues.apache.org/jira/browse/MESOS-1546 We'd like to have the configuration be explicit about the total number of masters, so that the assumption need not be made. On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton barton.to...@gmail.com wrote: Hi, what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've tried to read all documentation before doing actual upgrade, but I still don't understand a few things. What should be the quorum size? The --help says that It is imperative to set this value to be a majority of masters i.e., quorum (number of masters)/2 I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right? The recover.cpp says that: we allow a replica in EMPTY status to become VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in EMPTY status So, with quorum = 3 I would need 5 Mesos masters (that's just not clear from the mesos-master --help). quorum=1, mesos-masters=1 quorum=2, mesos-masters=3 quorum=3, mesos-masters=5 quorum=4, mesos-masters=7 Is is possible to have non-even number of Mesos masters? or is it just a bad idea? With 4 masters I got into a situation when: master 1: I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status received a broadcasted recover request master 2: I0722 11:36:37.593647 7754 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request master 3: I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response from a replica in STARTING status master 4: I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response from a replica in STARTING status I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response from a replica in VOTING status I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response from a replica in EMPTY status And the election algorithm ends up in an endless loop. How can I recover from this? Delete all replica logs from master disk? Start with quorum=1 and increment number of masters? Thanks, Tomas