[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Mike Percy has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Mike Percy has submitted this change and it was merged. Change subject: docs: clarify steps for changing master from multi-master deployment .. docs: clarify steps for changing master from multi-master deployment The current docs for multi-master migration discuss moving up from a single-master deployment to multi-master, but some users may want to move in the other direction. We've had to rely on the existing docs and have these users use their imagination to go through this. I've added docs specifying the process and parameters to do so. Additionally, this patch clarifies steps for multi-master recovery in case the cluster was configured without DNS aliases. Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Reviewed-on: http://gerrit.cloudera.org:8080/8032 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo Reviewed-by: Mike Percy --- M docs/administration.adoc 1 file changed, 77 insertions(+), 9 deletions(-) Approvals: Mike Percy: Looks good to me, approved Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Andrew Wong has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > > due to caching Related to KUDU-1620, we don't rebuild the consensus peer proxies and we assume that peers will come back at the same locations, which isn't the always the case (e.g. if we change the DNS aliases to point to a different location). This "caching" of master locations necessitates restarting the masters to get the newly-updated DNS alias hostnames -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Mike Percy has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > Thanks for running that experiment. What you observed makes sense. > due to caching What caching? -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Adar Dembo has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > In this DNS alias approach, though, there is no rewriting of configs any ma Thanks for running that experiment. What you observed makes sense. -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Andrew Wong has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > The problem is you only updated the config files on disk. You didn't add th In this DNS alias approach, though, there is no rewriting of configs any master. We just copy over the WAL and change the DNS aliases to point to the new master. Due to caching, the old masters don't see the new aliases, hence the need to restart. Perhaps the above applies to the removal steps below though (which does have the user stop all masters first and rewrite configs). -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Mike Percy has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > Original nodes A (dead), B*, C, and attempted to replace A with D. Tried go The problem is you only updated the config files on disk. You didn't add the configuration change to the WAL, and Raft replicates what's in the WAL. Come to think of it, it's quite dangerous to do it in a rolling fashion because the configuration is changing without updating the WAL and therefore 2 servers can have different configs but either one can get elected since they have the same last-logged opid in the WAL. In short, until we support config change on the master we should bring down all of the masters before modifying all of their configs and bringing them all back up. -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Andrew Wong has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. > So this works around KUDU-1620, right? What if we were to restart the remai Original nodes A (dead), B*, C, and attempted to replace A with D. Tried going through this process and a few things to note: * When I brought up D, both B and C's /masters pages successfully updated A's address to D's. * Looking at B's logs, this was not the case; it was still trying to contact A, as expected. * Looking at D's logs, I could see it losing a bunch of pre-elections since the remaining two masters already had a quorum (also, the D's web UI showed four masters, its UUID duplicated, both showing D's address). * After updating the DNS aliases, I restarted C. Once it came up, B continue being leader, and D still was not allowed in. * After restarting B, a C was elected, and the logs appeared normal across B, C*, D. * Interestingly, at the end of this all, B's, C's, and D's web UIs all showed an exact duplicate for D (rpc address and all). So it seems like nothing "goes wrong" with this approach, but I think while C was restarting, wewe were unavailable: single leader but no voters, and an effectively bricked replacement node, resulting in an extremely familiar window of unavailability of size . If, after I updated the DNS aliases, I'd restarted B* instead, would things have been different? With no leader, would we have been forced into an election? No; things would be the pretty much the same--D and C would not have been able to accept ops individually, and would not have elected a leader for the same unfortunate DNS alias reasons. TL;DR: Doesn't seem like it. -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Adar Dembo has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc File docs/administration.adoc: PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster : will be unavailable. So this works around KUDU-1620, right? What if we were to restart the remaining masters one at a time? Would that allow us to avoid any downtime? -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Andrew Wong has posted comments on this change. Change subject: docs: clarify steps for changing master from multi-master deployment .. Patch Set 3: (5 comments) See the rendering here: https://github.com/andrwng/kudu/blob/006ca06da2a91f178ba21a31fa19d01e710c9fd8/docs/administration.adoc http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc File docs/administration.adoc: Line 379: master. > Nit: other WARNING text begins with a capital letter. Below too. Done PS2, Line 382: this workflow without also restarting the live masters. As such, the workflow requires a : maintenance window, albeit a > You are technically correct (the best kind of correct) but there are nuance I added a warning to ensure the leader will be kept (at the otherwise risk of sever data loss). PS2, Line 382: this workflow without also restarting the live masters. As such, the workflow requires a : maintenance window, albeit a > Please double check this with Mike. Done PS2, Line 392: > nit: master nodes? Removing this line since I agree with Adar. PS2, Line 392: > I don't really understand why this instruction is worth including. Yes, it Done -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] docs: clarify steps for changing master from multi-master deployment
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8032 to look at the new patch set (#3). Change subject: docs: clarify steps for changing master from multi-master deployment .. docs: clarify steps for changing master from multi-master deployment The current docs for multi-master migration discuss moving up from a single-master deployment to multi-master, but some users may want to move in the other direction. We've had to rely on the existing docs and have these users use their imagination to go through this. I've added docs specifying the process and parameters to do so. Additionally, this patch clarifies steps for multi-master recovery in case the cluster was configured without DNS aliases. Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d --- M docs/administration.adoc 1 file changed, 77 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8032/3 -- To view, visit http://gerrit.cloudera.org:8080/8032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy