wojiaodoubao commented on PR #943:
URL: https://github.com/apache/ratis/pull/943#issuecomment-1772002266
Thanks @SzyWilliam @szetszwo for your great reply! IMHO, the root cause is
'We can't find any majority without new peers in C_new'.
Bad cases example:
1. Change from {n0} to {n0, ***n1***}.
2. Change from {n0, n1, n2} to {n0, n1, n2, ***n3, n4, n5***}.
3. Change from {n0, n1, n2} to {n0, ***n3, n4***}.
4. Change from {n0, n1, n2} to {n0, n1, ***n3, n4***}.
Replace/add/remove one by one can fit most cases. But fails in case 1,
change from {n0} to {n0, n1}.
The solution might be:
1. RaftServerImpl#setConfiguration checks the C_new. Throwing exception If
it can't find any majority without new peers in C_new. This might be the
easiest way to fix the problem. And the shortcoming is not allowing any conf
conversion as described in raft paper.
2. Configure vote whitelist for new peers. This is a little tricky, because
we must remove the whitelist as soon as the new peers' conf is updated to
old_and_new/transitional. Otherwise it may vote for the removed peer.
3. Start each new peer with {C_old, peer_itself}. And when a server not in
conf is asking for vote, don't let it crash. Preventing nodes from crashing
may cause very dangerous problems, which is probably why we let them crash. I
haven't investigated it. Please help if you know the crash reason.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]