----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39325/ -----------------------------------------------------------
(Updated Oct. 29, 2015, 4:55 a.m.) Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen. Bugs: MESOS-3280 https://issues.apache.org/jira/browse/MESOS-3280 Repository: mesos Description (updated) ------- MESOS-3280. The basic problem is that replicas silently ignore inbound Promise and Write requests if they have not finished the recovery protocol yet (because they can't safely vote on such requests). Hence, if we try to do a Paxos round while a quorum of nodes have not finished recovering, the Paxos round will never complete. In particular, this might happen during coordinator election: coordinator election (which is implemented as performing a full Paxos round) starts as soon as the candidate coordinator replica has finished the recovery protocol. If several nodes start concurrently, a quorum of those nodes might still be executing the recovery protocol, and hence the coordinator will never be elected. To address this, add "ignored" responses to the Promise and Write sub-protocols: if a proposer sees a quorum of "ignored" responses to a promise or write request it has issued, it knows the request will never succeed. When used for coordinator election, the current coding will retry immediately (without a backoff). Note that replicas will still silently drop promise/write requests if another kind of problem occurs (e.g., an I/O error prevents reading/writing log data). We might consider changing this, although it will require some thought: e.g., if a replica's disk is broken, sending an "ignored" message on every request might flood the network. REVIEWER NOTES: * GMock code is ugly, but see notes below. * Should we add exponential backoff when retrying coordinator election? Diffs ----- src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 Diff: https://reviews.apache.org/r/39325/diff/ Testing ------- "make check" passes, including a new test that uses a newly constructed mock to ensure we're testing the message schedule described above. I also wrote a script stops and starts mesos-master in a loop, removing the replicated log each time. Without the patch, this occasionally fails with a "registry fetch" timeout; with the patch, you can observe several scenarios where coordinator election is reborted and retried because a quorum of ignored responses is seen. Note that in some cases, we need to retry coordinator election up to ~70 times (!), because we don't currently use a backoff; that should probably be fixed, per comments above. But the important point is that election eventually succeeds and we don't hang. Thanks, Neil Conway