Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
             :   will be unavailable.
> So this works around KUDU-1620, right? What if we were to restart the remai
Original nodes A (dead), B*, C, and attempted to replace A with D. Tried going 
through this process and a few things to note:
* When I brought up D, both B and C's /masters pages successfully updated A's 
address to D's.
* Looking at B's logs, this was not the case; it was still trying to contact A, 
as expected.
* Looking at D's logs, I could see it losing a bunch of pre-elections since the 
remaining two masters already had a quorum (also, the D's web UI showed four 
masters, its UUID duplicated, both showing D's address).
* After updating the DNS aliases, I restarted C. Once it came up, B continue 
being leader, and D still was not allowed in.
* After restarting B, a C was elected, and the logs appeared normal across B, 
C*, D.
* Interestingly, at the end of this all, B's, C's, and D's web UIs all showed 
an exact duplicate for D (rpc address and all).

So it seems like nothing "goes wrong" with this approach, but I think while C 
was restarting, wewe were unavailable: single leader but no voters, and an 
effectively bricked replacement node, resulting in an extremely familiar window 
of unavailability of size <length of master restart>.

If, after I updated the DNS aliases, I'd restarted B* instead, would things 
have been different? With no leader, would we have been forced into an 
election? No; things would be the pretty much the same--D and C would not have 
been able to accept ops individually, and would not have elected a leader for 
the same unfortunate DNS alias reasons.

TL;DR: Doesn't seem like it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-HasComments: Yes

Reply via email to