[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-18 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


docs: clarify steps for changing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Additionally, this patch clarifies steps for multi-master recovery in
case the cluster was configured without DNS aliases.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Reviewed-on: http://gerrit.cloudera.org:8080/8032
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
Reviewed-by: Mike Percy 
---
M docs/administration.adoc
1 file changed, 77 insertions(+), 9 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-18 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> > due to caching
Related to KUDU-1620, we don't rebuild the consensus peer proxies and we assume 
that peers will come back at the same locations, which isn't the always the 
case (e.g. if we change the DNS aliases to point to a different location). This 
"caching" of master locations necessitates restarting the masters to get the 
newly-updated DNS alias hostnames


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> Thanks for running that experiment. What you observed makes sense.
> due to caching

What caching?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> In this DNS alias approach, though, there is no rewriting of configs any ma
Thanks for running that experiment. What you observed makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> The problem is you only updated the config files on disk. You didn't add th
In this DNS alias approach, though, there is no rewriting of configs any 
master. We just copy over the WAL and change the DNS aliases to point to the 
new master. Due to caching, the old masters don't see the new aliases, hence 
the need to restart.

Perhaps the above applies to the removal steps below though (which does have 
the user stop all masters first and rewrite configs).


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> Original nodes A (dead), B*, C, and attempted to replace A with D. Tried go
The problem is you only updated the config files on disk. You didn't add the 
configuration change to the WAL, and Raft replicates what's in the WAL. Come to 
think of it, it's quite dangerous to do it in a rolling fashion because the 
configuration is changing without updating the WAL and therefore 2 servers can 
have different configs but either one can get elected since they have the same 
last-logged opid in the WAL.

In short, until we support config change on the master we should bring down all 
of the masters before modifying all of their configs and bringing them all back 
up.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
> So this works around KUDU-1620, right? What if we were to restart the remai
Original nodes A (dead), B*, C, and attempted to replace A with D. Tried going 
through this process and a few things to note:
* When I brought up D, both B and C's /masters pages successfully updated A's 
address to D's.
* Looking at B's logs, this was not the case; it was still trying to contact A, 
as expected.
* Looking at D's logs, I could see it losing a bunch of pre-elections since the 
remaining two masters already had a quorum (also, the D's web UI showed four 
masters, its UUID duplicated, both showing D's address).
* After updating the DNS aliases, I restarted C. Once it came up, B continue 
being leader, and D still was not allowed in.
* After restarting B, a C was elected, and the logs appeared normal across B, 
C*, D.
* Interestingly, at the end of this all, B's, C's, and D's web UIs all showed 
an exact duplicate for D (rpc address and all).

So it seems like nothing "goes wrong" with this approach, but I think while C 
was restarting, wewe were unavailable: single leader but no voters, and an 
effectively bricked replacement node, resulting in an extremely familiar window 
of unavailability of size .

If, after I updated the DNS aliases, I'd restarted B* instead, would things 
have been different? With no leader, would we have been forced into an 
election? No; things would be the pretty much the same--D and C would not have 
been able to accept ops individually, and would not have elected a leader for 
the same unfortunate DNS alias reasons.

TL;DR: Doesn't seem like it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be 
sufficient). During this time the Kudu cluster
 :   will be unavailable.
So this works around KUDU-1620, right? What if we were to restart the remaining 
masters one at a time? Would that allow us to avoid any downtime?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..


Patch Set 3:

(5 comments)

See the rendering here:
https://github.com/andrwng/kudu/blob/006ca06da2a91f178ba21a31fa19d01e710c9fd8/docs/administration.adoc

http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc
File docs/administration.adoc:

Line 379: master.
> Nit: other WARNING text begins with a capital letter. Below too.
Done


PS2, Line 382: this workflow without also restarting the live masters. As such, 
the workflow requires a
 : maintenance window, albeit a 
> You are technically correct (the best kind of correct) but there are nuance
I added a warning to ensure the leader will be kept (at the otherwise risk of 
sever data loss).


PS2, Line 382: this workflow without also restarting the live masters. As such, 
the workflow requires a
 : maintenance window, albeit a 
> Please double check this with Mike.
Done


PS2, Line 392: 
> nit: master nodes?
Removing this line since I agree with Adar.


PS2, Line 392: 
> I don't really understand why this instruction is worth including. Yes, it 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify steps for changing master from multi-master deployment

2017-09-15 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/8032

to look at the new patch set (#3).

Change subject: docs: clarify steps for changing master from multi-master 
deployment
..

docs: clarify steps for changing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Additionally, this patch clarifies steps for multi-master recovery in
case the cluster was configured without DNS aliases.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
---
M docs/administration.adoc
1 file changed, 77 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8032/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy