subject:"\[kudu\-CR\] KUDU\-1330\: Add a tool to unsafely recover from loss of majority replicas"

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-03 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#3).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.
The lonely survivor talking to the tool tends to become the leader in
new config in majority of the use cases because:
a) The API/tool mimicks the actual leader the survivor node
   had voted for and replicate the new config with a higher term and
   bumped up opid_index. This ensures that 2 new nodes added as part of
   re-replication respect the term emitted by this node and accept
   this node as leader.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader chosen in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a way to abort a pending config change
because pending config comes in the way of recovery tool trying to
replicate/commit the new config on the surviving tablet server. There is only
one pending config change allowed at a time for a given tablet, hence
aborting the pending config change seems safest bet.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Accept more replica_uuids from the command line to make support multiple
   peers to be added in the new config.
1) Test with a 5-replica config forcing the old {ABCDE} to new {AB} on A.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/time_manager-test.cc
M src/kudu/consensus/time_manager.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
15 files changed, 545 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-07 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#4).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.
The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The API/tool acts as a fake leader and generaets the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once this config change is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a way to abort a pending config change
because pending config comes in the way of recovery tool trying to
replicate/commit the new config on the surviving tablet server. There is only
one pending config change allowed at a time for a given tablet, hence
aborting the pending config change seems safest bet.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with unresponsive servers evicted from the config after the
   'follower_unavailable_considered_failed_sec' timeout.
1) Accept more than one peers to be added from the command line
   to support multiple peers to be added in the replaced config.
   This needs some change to ChangeConfigRequestPB to acept multiple
   RaftPeerPB in one single request.
2) Once 1) is in place, test with a 5-replica config replacing
   the old config {ABCDE} with new config {AB} on A.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
13 files changed, 577 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-07 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS1, Line 1556: RaftConsensus::ReplaceConfig
> I'll let mike address the pending/active config concerns of this method, si
TFTR David, just a heads up here that latest patch doesn't reflect all the 
suggestions made above, and some rationales are listed below.

1) I agree with adding checks about consensus being stuck for a while instead 
of letting the tool operate immediately without allowing automatic recovery. Is 
there a metrics we should rely on ? or should that check be a function of some 
peers in the config and they being unresponsive for a while ? The latter case 
falls under node eviction after certain timeouts, so we could wait for at least 
follower_unavailable_considered_failed_sec.

2) I am having second thoughts on making the API accept list of removed uuids, 
because we want to be able to enforce a config of {AB} on node A despite a 
committed config of {ACD} when C and D become dead nodes at some point. This 
also means that, tool also needs to input the Host and Port address endpoint of 
B and there are other commands to fetch that info. Currently uuid presence 
check is overly restrictive becz it checks if the passed uuid is in config or 
not and it accepts only one uuid at the moment. I need to add a unit test for 
the checks I have introduced in this API - i.e simulating user errors and see 
that they fail as expected.
Having the API accept a list of uuids to be added for the new config seems like 
a better idea. This requires either a new RPC taking RaftConfigPB in the 
request directly, or ChangeConfigRequestPB can take a list of RaftPeerPB and 
the API can construct the new config from that list. Usual 
AddServer/RemoveServer can expect the size of that list to be 1. Let me know 
your thoughts.

3) I refactored a bit grabbing consensus states and consensus queue states as 
one snapshot before building the new config and consensus request for Update(). 
The API assumes currently that dead nodes aren't interfering, so this snapshot 
of the state becomes relevant when we weaken those assumptions for future 
revisions.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-08 Thread David Ribeiro Alves (Code Review)

David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:

PS4, Line 255: ReplaceConfig
there's only one implementation of consensus now, no need to do this.


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS4, Line 1561: ReplaceConfig
this needs a LOG WARNING in big bold letters at several stages.


PS4, Line 1573:   // ReplaceConfig() replaces the config irrespective of 
opid_index value,
  :   // hence there is no meaning for cas_config_opid_index in 
this API.
wouldn't this be a great way to make sure that you're changing the actual 
config you want to change? i.e. say that the lonely node is stuck with 
committed config of 10.10,  if you make sure that is still the case you know 
it's safe to update stuff cuz the config is still what you expect.


PS4, Line 1614:   // in the consensus request later.
  :   int64_t all_replicated_index = 
queue_->GetAllReplicatedIndex();
  :   OpId last_opid = queue_->GetLastOpIdInLog();
  :   uint64 msg_timestamp = time_manager_->GetSafeTime().value();
I mentioned in a previous rev that these need to be obtained under a lock so 
that they are mutually consistent. term should be included in this "snapshot"


PS4, Line 1675: state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER
isn't it likely that the lonely node will think he's the leader?


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS4, Line 160: NO_WAIT_FOR_LEADER
nit: DONT_WAIT_FOR_LEADER


PS4, Line 166: WaitForReplicasReportedToMaster
best to make this return a Status::TimeOut


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-08 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS1, Line 1556: RaftConsensus::ReplaceConfig
> TFTR David, just a heads up here that latest patch doesn't reflect all the 
After discussion with Mike yesterday couple of things we want to revise further:
1) Instead of squeezing this functionality in leader-driven ChangeConfig RPC, 
we will have a new RPC which accepts ReplaceConfigRequestPB containing bunch of 
uuids to be removed from the config (as per your original suggestion of making 
the API less user-error-prone). We can change these semantics later if we want 
to.

2) For the pending config change which was replicated on WAL, we decided to 
relax "allow-max-one-pending-config-change" criteria for this API. This means 
we may observe 2 replicated CONFIG_CHANGE on the log after crash recovery first 
one was original one, second was appended by the tool.

3) Current patch uses last replicated opid to set the committed index field in 
consensus request instead of last committed opid, I have fixed that. I still 
need to see what value should we pass in 'all_replicated_index' in the 
consensus request or do we need that at all.

4) Some misc tooling changes to accommodate 1).

5) I have removed changes from consensus_queue.[hc], since they were no longer 
necessary after I updated the patch to use a fake leader_uuid.

6) Fake leader uuid string will be replaced by a static id string "/bin/kudu" 
after 1) is in place and that static id string is supplied by the tool itself. 
My concern here was if it goes through any uuid sanity checks somewhere in the 
core consensus, testing will confirm this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#5).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with 2 pending config changes on WAL one after another
   and crash and check that tablet bootstrap is fine.
1) Accept more than one peers to be added from the command line
   to support multiple peers to be added in the replaced config.
2) Once 1) is in place, test with a 5-replica config replacing
   the old config {ABCDE} with new config {AB} on A.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
15 files changed, 620 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:

PS4, Line 255: ReplaceConfig
> there's only one implementation of consensus now, no need to do this.
Thanks, yeah, it should be pure virtual here.


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS4, Line 1561: ReplaceConfig
> this needs a LOG WARNING in big bold letters at several stages.
done.


PS4, Line 1573:   // ReplaceConfig() replaces the config irrespective of 
opid_index value,
  :   // hence there is no meaning for cas_config_opid_index in 
this API.
> wouldn't this be a great way to make sure that you're changing the actual c
yeah, it could be, for now tool doesn't read what's there in the config, I 
should add that.


PS4, Line 1614:   // in the consensus request later.
  :   int64_t all_replicated_index = 
queue_->GetAllReplicatedIndex();
  :   OpId last_opid = queue_->GetLastOpIdInLog();
  :   uint64 msg_timestamp = time_manager_->GetSafeTime().value();
> I mentioned in a previous rev that these need to be obtained under a lock s
The term is obtained from ReplicaState @L1588 and these fields are grabbed from 
queue state and each of these member functions grab a spinlock from inside. As 
such I didn't see a reason for queue state snapshot to collate here. Or may be 
I haven't understood your comment fully.


PS4, Line 1675: state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER
> isn't it likely that the lonely node will think he's the leader?
Actually this whole block of code was removed now because of couple of reasons: 
a) This was only aborting the in-memory cmeta config changes pending and was 
not addressing what to do with the WAL entries. b) It became moot after we 
faked leader uuid from the tool; i.e., whether this node was a follower or a 
leader, the pending config changes get aborted as part of 
Update()->AbortOpsAfter() based on what is present on log and what's in 
consensus req. If this node was the leader, he would step down looking at 
higher term and abort.


http://gerrit.cloudera.org:8080/#/c/6066/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS4, Line 160: NO_WAIT_FOR_LEADER
> nit: DONT_WAIT_FOR_LEADER
done


PS4, Line 166: WaitForReplicasReportedToMaster
> best to make this return a Status::TimeOut
Sounds good, tx


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#6).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with 2 pending config changes on WAL one after another
   and crash and check that tablet bootstrap is fine.
1) Accept more than one peers to be added from the command line
   to support multiple peers to be added in the replaced config.
2) Once 1) is in place, test with a 5-replica config replacing
   the old config {ABCDE} with new config {AB} on A.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
15 files changed, 657 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#7).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with 2 pending config changes on WAL one after another
   and crash and check that tablet bootstrap is fine.
1) Accept more than one peers to be added from the command line
   to support multiple peers to be added in the replaced config.
2) Once 1) is in place, test with a 5-replica config replacing
   the old config {ABCDE} with new config {AB} on A.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
15 files changed, 657 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(15 comments)

I still need to look through the tests but here are some initial comments

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

PS7, Line 488: 4
You can just number this starting from 1


PS7, Line 492: sender_id
How about caller_uuid or caller_id


PS7, Line 497: required RaftPeerPB server
Comment needs update?

How about: required RaftConfigPB new_config?


PS7, Line 506: new_config
Is this used for something? Regardless, see my comment above, it seems to me 
that we should take this as input instead of provide it as output.


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

Line 93:   optional bool allow_unsafe = 4;
I think this belongs in ChangeConfigRecordPB


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1148: if (state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER) {
This change can be reverted, right?


Line 1598:   string replica_uuid = peer_pb.permanent_uuid();
> warning: the variable 'replica_uuid' is copy-constructed from a const refer
Based on my comments elsewhere I think this removal logic should probably 
happen on the client side, but you could use RemoveFromRaftConfig() for this


Line 1630:   new_config.add_peers()->CopyFrom(new_peer);
So... we are only allowing changing to a single-node config?


PS7, Line 1651: current_term + 1
please extract this into a new variable for use here and on line 1664


PS7, Line 1665: preceding_opid.index() + 1
Please extract a variable for this so it doesn't look like a magic number. You 
can define it next to the new term variable I mentioned on line 1651.


PS7, Line 1667: msg_timestamp
Don't we need this to be higher


Line 1672:   LOG_WITH_PREFIX(WARNING)
I don't think it's necessary for all caps in the server log. If you want to 
print a warning, let's do it in the client tool. I do agree that we should log 
that we did a forced config change, though.


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS7, Line 201: allow_unsafe
I think this should be passed into this function as an enum flag, not part of 
the RaftConfigPB


Line 238:   if (!committed_config.allow_unsafe()) {
This doesn't seem right to me. I think this check should only be skipped if the 
pending config is the one that was forced, since that implies we could commit 
an intermediate pending config that the latest pending config has forced an 
override of. Whether the committed config was forced or not doesn't factor into 
this, I does it?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 167: Status WaitForReplicasReportedToMaster(
Doesn't ExternalMiniCluster::WaitForTabletServerCount() already implement this?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS7, Line 226: committed_config
As we discussed offline, it would be an improvement to rename this argument 
'config_to_commit'


PS7, Line 240: pending_config.allow_unsafe
Hmm. I know I suggested moving 'allow_unsafe' to ChangeConfigRecordPB in other 
review comments, but I wonder if this usage makes it difficult to do it... 
Maybe it's not worth the effort to move it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-09 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1328: UNLOCKED
why _UNLOCKED?


PS7, Line 1328: .
nit: missing space after period


Line 1680: s = StatusFromPB(consensus_resp.error().status());
Is there a specific error type you are looking for here?

Should we consider adding tests for failure cases?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 159: TEST_F(AdminCliTest, TestReplaceConfigOnSingleFollower) {
Please add test comments with a brief summary of the test.


PS7, Line 172: find
You need to use equal_range() here: 
https://stackoverflow.com/questions/9046922/unordered-multimap-iterating-the-result-of-find-yields-elements-with-differe


PS7, Line 193:  TServerDetails* follower1 = nullptr;
 :   TServerDetails* follower2 = nullptr;
 :   vector tservers;
 :   AppendValuesFromMap(active_tablet_servers, &tservers);
 :   for (TServerDetails* ts : tservers) {
 : if (ts->uuid() != leader_ts->uuid()) {
 :   if (follower1 == nullptr) {
 : follower1 = ts;
 :   } else {
 : follower2 = ts;
 :   }
 : }
 :   }
This is duplicated multiple times in this file. Can you extract out some of 
these helper functions to avoid so much copy and paste?


Line 224: 
Maybe shut down the master, first wait until # committed voters is 1, then 
bring up the master and do these other waiting checks.


PS7, Line 255: timeout
How about kTimeout here and elsewhere, also I think at least 30 seconds is 
required to avoid flakiness on AWS sometimes


Line 348: TEST_F(AdminCliTest, TestReplaceConfigOnLeaderWithPendingConfig) {
Can we add a version for a follower with a pending config?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-10 Thread David Ribeiro Alves (Code Review)

David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 8: Code-Review+1

(2 comments)

left a couple of comments. like the approach better, thanks for the 
improvements. letting mike take it the rest of the way.

http://gerrit.cloudera.org:8080/#/c/6066/8/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS8, Line 1612:   // Take the snapshot of the queue state and timestamp to 
stick them
  :   // in the consensus request later.
  :   int64_t all_replicated_index = 
queue_->GetAllReplicatedIndex();
  :   int64 last_committed_index = queue_->GetCommittedIndex();
  :   OpId preceding_opid = queue_->GetLastOpIdInLog();
  :   uint64 msg_timestamp = time_manager_->GetSafeTime().value();
needs to be done under the lock.


PS8, Line 1672: LOG_WITH_PREFIX(WARNING)
  : << "REPLACING THE CONFIG ON THIS SERVER WITH A NEW 
CONFIG,"
  : << "THIS OPERATION IS IRREVERSIBLE !!\n"
  : << "COMMITTED CONFIG :" << 
committed_config.DebugString()
  : << "NEW CONFIG :" << new_config.DebugString();
add one like this with the result of the actual change config.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-11 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

PS7, Line 488: 4
> You can just number this starting from 1
Done.


PS7, Line 492: sender_id
> How about caller_uuid or caller_id
Done


PS7, Line 497: required RaftPeerPB server
> Comment needs update?
Done


PS7, Line 506: new_config
> Is this used for something? Regardless, see my comment above, it seems to m
Done


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

Line 93:   optional bool allow_unsafe = 4;
> I think this belongs in ChangeConfigRecordPB
I guess you changed your mind about this suggestion later if I am correct.


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1148: if (state_->GetActiveRoleUnlocked() != RaftPeerPB::LEADER) {
> This change can be reverted, right?
yeah, I was about to, missed in previous patch. Done.


PS7, Line 1328: .
> nit: missing space after period
Done


PS7, Line 1328: UNLOCKED
> why _UNLOCKED?
Done, yeah that was not intentional.


Line 1563:  
UnsafeChangeConfigResponsePB *resp,
> warning: parameter 'resp' is unused [misc-unused-parameters]
Done


Line 1598:   string replica_uuid = peer_pb.permanent_uuid();
> warning: the variable 'replica_uuid' is copy-constructed from a const refer
Done


Line 1598:   string replica_uuid = peer_pb.permanent_uuid();
> Based on my comments elsewhere I think this removal logic should probably h
I believe there are substantial changes here based on our offline discussions, 
so please re-review this portion.


Line 1630:   new_config.add_peers()->CopyFrom(new_peer);
> So... we are only allowing changing to a single-node config?
That has changed now, the CLI/API takes the new config itself.


PS7, Line 1651: current_term + 1
> please extract this into a new variable for use here and on line 1664
Done


PS7, Line 1665: preceding_opid.index() + 1
> Please extract a variable for this so it doesn't look like a magic number. 
Done


PS7, Line 1667: msg_timestamp
> Don't we need this to be higher
higher than ? can you explain bit more what you have in mind ?


Line 1672:   LOG_WITH_PREFIX(WARNING)
> I don't think it's necessary for all caps in the server log. If you want to
Done


Line 1680: s = StatusFromPB(consensus_resp.error().status());
> Is there a specific error type you are looking for here?
Oh yeah, definitely in my radar, updated the commit_msg accordingly.


http://gerrit.cloudera.org:8080/#/c/6066/8/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS8, Line 1612:   // Take the snapshot of the queue state and timestamp to 
stick them
  :   // in the consensus request later.
  :   int64_t all_replicated_index = 
queue_->GetAllReplicatedIndex();
  :   int64 last_committed_index = queue_->GetCommittedIndex();
  :   OpId preceding_opid = queue_->GetLastOpIdInLog();
  :   uint64 msg_timestamp = time_manager_->GetSafeTime().value();
> needs to be done under the lock.
David, sorry still not following. Did you mean lock for timestamp ?


PS8, Line 1672: LOG_WITH_PREFIX(WARNING)
  : << "REPLACING THE CONFIG ON THIS SERVER WITH A NEW 
CONFIG,"
  : << "THIS OPERATION IS IRREVERSIBLE !!\n"
  : << "COMMITTED CONFIG :" << 
committed_config.DebugString()
  : << "NEW CONFIG :" << new_config.DebugString();
> add one like this with the result of the actual change config.
Actually, the fact that we have written this new_config doesn't mean that 
cluster came back up and tablet became operational. Tool throws a warning today 
quoting this operation is asynchronous and lot can happen in the background, 
longer wait, etc. As such, there is nothing else to print here apart from 
config being overwritten(new_config). Or did you have something else in mind ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS7, Line 226: committed_config
> As we discussed offline, it would be an improvement to rename this argument
Done


Line 238:   if (!committed_config.allow_unsafe()) {
> This doesn't seem right to me. I think this check should only be skipped if
Actually, there is a chicken-and-egg kinda issue here; I have tried to make 
this clear in comments now. We should check for equality of config only when we 
know both pending and to-be-committed config match wrt their 'unsafe' flag. Let 
me know if you want the comments to be more clearer.


http://gerrit

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-11 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#9).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with 2 pending config changes on WAL one after another
   and crash and check that tablet bootstrap is fine.
2) Test with a 5-replica config replacing the old config {ABCDE}
   with a new config {AB} on A.
3) Test exercising all the error cases in the UnsafeChangeConfig API.
4) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
18 files changed, 897 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-11 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#10).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
0) Test with 2 pending config changes on WAL one after another
   and crash and check that tablet bootstrap is fine.
2) Test with a 5-replica config replacing the old config {ABCDE}
   with a new config {AB} on A.
3) Test exercising all the error cases in the UnsafeChangeConfig API.
4) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
18 files changed, 908 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-16 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#11).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
18 files changed, 1,073 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-17 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6066/11//COMMIT_MSG
Commit Message:

Line 41: 
A description of the test cases you have implemented would be nice.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 495:   required RaftConfigPB new_config = 4;
Can we also add a optional int64 cas_config_opid_index like in 
ChangeConfigRequestPB?


PS11, Line 499: otherwise 'new_configuration' is set
Update comment


PS11, Line 522: Unsafe
nit: Can we change this to Force instead of Unsafe?


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

PS11, Line 93: allow_unsafe
How about forced_config_change?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1667: pid_index);
> higher than ? can you explain bit more what you have in mind ?
You are using safetime which is in the past. Don't we want the timestamp to be 
Now()? I would like David to take a look at this.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
How about validating that the current node is a voter in the config?


Line 1601:   new_peer.last_known_addr().host() == 
committed_peer.last_known_addr().host() &&
Do we expect last_known_addr to always be passed in the new config? I thought 
we only required uuid. See below comment.


Line 1623:   RaftConfigPB new_config = config;
How about just use a copy of the committed config, but with the uuids we want 
to remove removed?

Something like:

set retained_uuids;
for (const auto& peer : config.peers()) {
  const string& uuid = peer.permanent_uuid();
  if (!IsRaftConfigMember(uuid, committed_config)) {
return Status::InvalidArgument(...);
  }
  InsertOrDie(&retained_uuids, uuid);
}
RaftConfigPB new_config = committed_config;
for (const auto& peer : committed_config) {
  const string& uuid = peer.permanent_uuid();
  if (ContainsKey(retained_uuids, uuid)) continue;
  CHECK(RemoveFromRaftConfig(&new_config, uuid));
}


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

Line 238:   // In the event of unsafe config change enforced from an external 
tool,
> Actually, there is a chicken-and-egg kinda issue here; I have tried to make
I don't understand that logic. If there are multiple "pending" configs, which 
this condition implies since we are committing something, then the latest 
"pending" config must be forced. Right?


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS11, Line 212: Allowing config update
How about: Allowing forced config change


PS11, Line 239: may not have a pending config
You mean that it may not match the currently-pending config, right?


PS11, Line 242: both
I don't think this will work in all cases; see below


PS11, Line 247: !config_to_commit.allow_unsafe()
As mentioned in the rev 7 thread, I don't think this condition is needed


PS11, Line 248: pending_config.allow_unsafe() && config_to_commit.allow_unsafe()
I think this should be removed due to the following scenario:

remote_replica unsafe_change_config a.b.c.d foo A B C D E
remote_replica unsafe_change_config a.b.c.d foo A B C
remote_replica unsafe_change_config a.b.c.d foo A

The first 2 could still be pending and would need to be committed before the 
first one can be committed.


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 224: tablet_id_,
> Because #3->#1->#3 happens asynchronously, we may miss the transition to #1
The master orchestrates the part from #1 -> #2 -> #3, so if the master is dead 
we can wait for #1 to occur.


Line 348:3, tablet_id_, 
kTimeout,
> Actually, we can not send in a ChangeConfig API explicitly to a follower be
You can force a leader to step down to be a follower, while it has a pending 
config change, before running the rest of the test. The config change will 
remain pending.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS11, Line 665: fault
I like the idea of using a fault to trigger committing config changes during 
tablet bootstrap.

Can we additionally test this scenario during normal operation (outside of 
bootstrap)? We could do something like this (in pseudocode):

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-17 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
> How about validating that the current node is a voter in the config?
Note: You can use the helper function IsRaftConfigVoter() from quorum_util.h


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-17 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 495:   required RaftConfigPB new_config = 4;
> Can we also add a optional int64 cas_config_opid_index like in ChangeConfig
On second thought, maybe we can defer that since right now we only support 
moving to a subset of the committed config.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/6066/11//COMMIT_MSG
Commit Message:

Line 41: 
> A description of the test cases you have implemented would be nice.
Done


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 495:   required RaftConfigPB new_config = 4;
> On second thought, maybe we can defer that since right now we only support 
agreed, not changing for now.


PS11, Line 499: otherwise 'new_configuration' is set
> Update comment
Done


PS11, Line 522: Unsafe
> nit: Can we change this to Force instead of Unsafe?
I initially started with that naming convention, but changed it to Unsafe only 
because the ForceChangeConfig seemed bit vague to me since the actual 
ChangeConfig itself is a forced operation from a follower point of view. 
However, if we want to differentiate between automatic and manual, we could go 
with ManualChangeConfig perhaps ?


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

PS11, Line 93: allow_unsafe
> How about forced_config_change?
this can go along with the naming convention for API, since I was suggesting 
'manual' earlier, perhaps manual_config_change ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1667: pid_index);
> You are using safetime which is in the past. Don't we want the timestamp to
Lemme poke David to be sure, I am not sure if it makes much difference. One of 
the point David had was perhaps we shouldn't let the API to be exercised too 
soon by the user. i.e., we can have a check somewhere which says consensus on 
this node was stuck for say more than 10 minutes at the least to allow room for 
automatic recovery.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
> Note: You can use the helper function IsRaftConfigVoter() from quorum_util.
Done, I didn't use that routine because we already can compare directly its 
member type here. Also I am explicitly not comparing with specific membership 
type, because thats something which can be enforced by the tool. This also 
means we could let the user specify membership type on the CLI.


Line 1601:   new_peer.last_known_addr().host() == 
committed_peer.last_known_addr().host() &&
> Do we expect last_known_addr to always be passed in the new config? I thoug
yeah we do. If you can recall, there was a reason why we chose RaftConfigPB as 
part of RPC, reason being in the case of 5-replica original config {ABCDE} and 
2 nodes are part of the enforced config {AB} being written on node A, we want 
to be able to provide both A and B's HostPort endpoints as part of config. The 
tool could query HostPort details from master, but tool wouldn't know whether 
master has a stale info or recent. So for now, pushing back on the user to 
provide the complete info instead of this API or the tool determining that info 
at run time.


Line 1623:   RaftConfigPB new_config = config;
> How about just use a copy of the committed config, but with the uuids we wa
That's a good idea, but given that we are supplying the entire config to the 
API this is not different than the current code wrt end result. Did you have 
any other concern in mind ?


http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

Line 238:   // In the event of unsafe config change enforced from an external 
tool,
> I don't understand that logic. If there are multiple "pending" configs, whi
I will move this discussion to latest patch since this code/comments have 
changed a bit there, and I see that you have added sine comments there as well.


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS11, Line 212: Allowing config update
> How about: Allowing forced config change
Done


PS11, Line 239: may not have a pending config
> You mean that it may not match the currently-pending config, right?
thats correct, thanks for catching, updated.


PS11, Line 242: both
> I don't think this will work in all cases; see below
update comment here.


PS11, Line 247: !config_to_commit.allow_unsafe()
> As mentioned in the rev 7 thread, I don't think this condition is needed
thats true, updated.


PS11, Line 248: pending_config.allow_unsafe() && config_to_commit.allow_unsafe()
> I think this should be removed due to the following scenario:
great catch, yeah.. but how can we let the unsa

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/6443

to review the following change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 nodes.
- Unsafe config change on a 5-replica config with 2 nodes in the new config.
- Unsafe config change when there is a pending config on a surviving leader.
- Unsafe config change when there is a pending config on a surviving leader.
- Unsafe config change when there are back to back pending configs on logs.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700

ver 2

Change-Id: Icdf7b962d404ab4a9e838695d60ac2cd9cf3df1a
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
18 files changed, 1,149 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/43/6443/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6443
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icdf7b962d404ab4a9e838695d60ac2cd9cf3df1a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#12).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 nodes.
- Unsafe config change on a 5-replica config with 2 nodes in the new config.
- Unsafe config change when there is a pending config on a surviving leader.
- Unsafe config change when there is a pending config on a surviving leader.
- Unsafe config change when there are back to back pending configs on logs.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
18 files changed, 1,149 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has abandoned this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Abandoned

Uploaded to new link by mistake, original patch is in 
http://gerrit.cloudera.org:8080/6066/

-- 
To view, visit http://gerrit.cloudera.org:8080/6443
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Icdf7b962d404ab4a9e838695d60ac2cd9cf3df1a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

PS11, Line 522: Unsafe
> I initially started with that naming convention, but changed it to Unsafe o
I guess Force is ambiguous. Manual seems a little too specific to me. I'm okay 
with leaving this as Unsafe


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
> Done, I didn't use that routine because we already can compare directly its
Still not seeing this check in Rev 12


Line 1601:   new_peer.last_known_addr().host() == 
committed_peer.last_known_addr().host() &&
> yeah we do. If you can recall, there was a reason why we chose RaftConfigPB
I am confused about this comment. We already have them in the committed config. 
Why not just use it from there?


Line 1623:   RaftConfigPB new_config = config;
> That's a good idea, but given that we are supplying the entire config to th
My only concern is that users have to specify all this stuff like ports when 
they don't need to.


http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS12, Line 212: config
Based on the latest discussion in the review comments let's do s/forced/unsafe/


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS11, Line 665: fault
> Sure, just to be sure this has nothing todo with pending config right ? Can
This test is to ensure that we can handle multiple pending config changes, 
since A will not be able to commit any of those successive unsafe config 
changes until the last one comes in, since the rest of the cluster is down.

At the end, A should commit the last config change (with only itself in the 
cluster) and the last committed log index should have incremented by as many 
configs as we appended to its log ( plus one for the election no-op).


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

Line 95: Status ParseHostPortString(const std::string& hostport_str,
> No, but IIRC I believe not allowing default port was intentional. We are us
Why don't we just allow the user to specify UUIDs only? We can do that since 
currently we only allow them to remove nodes from the config.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

PS11, Line 93: allow_unsafe
> this can go along with the naming convention for API, since I was suggestin
let's go with unsafe_config_change ?


http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS12, Line 1597: retained_uuids
unused in this patch


http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS12, Line 238: In the event of unsafe config change enforced from an external 
tool
In the event of an unsafe config change triggered by an administrator,


PS12, Line 239:  config 
the config


PS12, Line 240: tool overwrote that pending config with 'unsafe' flag. So only 
the config
  :   // which has the 'unsafe' flag unset both in pending and 
to-be-committed state
  :   // can go through the config equality check.
How about:

... because unsafe config change allows multiple pending configs to exist. 
Therefore we only need to validate that 'config_to_commit' matches the pending 
config if the pending config does not have its 'unsafe_config_change' flag set.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS12, Line 1652: and pre-elect himself later observing the heartbeat failure 
from the
   :   // non-existent node.
Can we remove this part of the comment?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1667: msg_timestamp
> Lemme poke David to be sure, I am not sure if it makes much difference. One
Regardless, I don't think it's ever legal to create new writes at the safe time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-20 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS11, Line 248:   config_to_commit.SerializeAsString())
> great catch, yeah.. but how can we let the unsafe config go through this eq
I think the code in rev 12 is correct. I am a little confused by this comment, 
can you clarify your question? If the question is how do we perform validation, 
in my opinion the answer is that we don't in the case of unsafe pending config 
changes. This is a safety check that is violated by unsafe config change (and 
that is by design).


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-21 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#13).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 nodes.
- Unsafe config change on a 5-replica config with 2 nodes in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present on the node and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
15 files changed, 1,131 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-21 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 11:

(12 comments)

TFTR Mike for continued efforts in reviewing this patch, addressed all comments 
below after discussions offline, please take a look.

http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1592:   // Check that passed replica uuids are part of the committed 
config on this node.
> Still not seeing this check in Rev 12
I wanted more clarity here offline, so updated now.


Line 1601:   new_peer.last_known_addr().host() == 
committed_peer.last_known_addr().host() &&
> I am confused about this comment. We already have them in the committed con
Done


Line 1623:   RaftConfigPB new_config = config;
> My only concern is that users have to specify all this stuff like ports whe
updated, after the discussion offline.


http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS12, Line 1597:  new_peer : co
> unused in this patch
added the missing code, and also updated comments here.


PS12, Line 1652: 64 replicate_opid_index = preceding_opid.index() + 1;
   :   consensus_req.set_cal
> Can we remove this part of the comment?
Done


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS11, Line 248: pending_config.allow_unsafe() && config_to_commit.allow_unsafe()
> I think the code in rev 12 is correct. I am a little confused by this comme
If we know for sure that config_to_commit(with unsafe flag set) does have a 
pending config which was never overwritten, then we can take that 
config_to_commit through equality check. For eg, in your example unsafe config 
operations above, the last "remote_replica unsafe_change_config a.b.c.d foo A" 
operation can still go through this equality check validation and succeed.


http://gerrit.cloudera.org:8080/#/c/6066/12/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS12, Line 212: config
> Based on the latest discussion in the review comments let's do s/forced/uns
Done


PS12, Line 238: In the event of unsafe config change enforced from an external 
tool
> In the event of an unsafe config change triggered by an administrator,
Done


PS12, Line 239:  config 
> the config
Done


PS12, Line 240: tool overwrote that pending config with 'unsafe' flag. The new 
config however
  :   // has the 'unsafe' flag set and has a pending config too. So 
only the config
  :   // which has the 'unsafe' flag set both in p
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS11, Line 665: fault
> This test is to ensure that we can handle multiple pending config changes, 
Done


http://gerrit.cloudera.org:8080/#/c/6066/11/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

Line 95: Status ParseHostPortString(const std::string& hostport_str,
> Why don't we just allow the user to specify UUIDs only? We can do that sinc
done, reverted all of these changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-21 Thread David Ribeiro Alves (Code Review)

David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/7/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS7, Line 1667: msg_timestamp
> Regardless, I don't think it's ever legal to create new writes at the safe 
Mike is right that getting the safe time is wrong. You likely need to make 
TimeManager::GetSerialTimestamp() callable here (either friending or making it 
public, likely friending is slightly better IMO) to be able to obtain a 
timestamp.

I've been also mentioning for a while that the timestamp as well as the OpId 
and term all need to be part of the state snapshot that is taken under the lock.
Let me be more specific: I think block starting at LOC 1579 should look like:

int64_t all_replicated_index;
int64 last_committed_index;
OpId preceding_opid;
uint64 msg_timestamp;
  {
ReplicaState::UniqueLock lock;
RETURN_NOT_OK(state_->LockForRead(&lock));
current_term = state_->GetCurrentTermUnlocked();
committed_config = state_->GetCommittedConfigUnlocked();
if (state_->IsConfigChangePendingUnlocked()) {
  LOG_WITH_PREFIX_UNLOCKED(WARNING)
<< "Node has a pending config, so new config will be appended. "
<< "Currently pending config on the node :"
<< state_->GetPendingConfigUnlocked().DebugString();
}
 all_replicated_index = queue_->GetAllReplicatedIndex();
 last_committed_index = queue_->GetCommittedIndex();
 preceding_opid = queue_->GetLastOpIdInLog();
 msg_timestamp = time_manager_->GetSerialTimestamp().value;
  }

This way you get a consistent view of the whole state and an index/timestamp 
misorder is not possible


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-22 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 14:

(11 comments)

Mostly minor feedback, still looking carefully at the tests but wanted to give 
some earlier feedback.

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

Line 93:   optional bool unsafe_config_change = 4;
do we need to give this a default of false?


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 616: if (!new_config.unsafe_config_change()) {
I think we need a default for this field or we first need to check 
has_unsafe_config_change() everywhere we use it.


PS14, Line 1612: Peer uuid $0 is not found on original
Peer with UUID $0 is not in the committed


PS14, Line 1613: the node
this replica


Line 1617: if (!IsRaftConfigVoter(peer_uuid, committed_config)) {
I don't think this voter check is necessary. What we should do here is check 
that the *local* replica is a voter in a final 'new_config' that we construct 
below.


PS14, Line 1653: This ensures that current node replicates the new config 
proposed,
This makes this request appear to come from a new leader that the local replica 
doesn't know about yet.


PS14, Line 1654: node
local replica


PS14, Line 1654: it would step down before replicating
this will cause it to step down


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS14, Line 257: FLAGS_fault_crash_before_append_commit
Don't reuse this unrelated fault flag for this test. That flag is for the WAL. 
Just define a new one.

Also, do we really need the fault flag to be inside this conditional for the 
test or can we remove the 'if' around this? If so, I think a good place to put 
it would be in ConsensusMetadata::Flush() and we could call it 
fault_crash_before_flush_consensus_meta perhaps.


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
Intentional?


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

PS14, Line 431: Enforce raft config on the tablet server with a new config
Using the word enforce in this context sounds strange to me. How about:

Force the specified replica to adopt a new Raft config. The members of the new 
Raft config must be a subset of (or the same as) the members of the existing 
committed Raft config on that replica.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-22 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 14:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS14, Line 199: itest:
nit: Can you add a 'using' clause for this and other uses of itest:: stuff, for 
consistency in this file?


PS14, Line 523: 1
Magic number. Why 1? Also, what if the election is churny? Since we know a 
leader got elected I think we should use GetLastOpIdForReplica() on the leader 
to find out what this baseline should be.


PS14, Line 786: 8 
9


PS14, Line 828: 1
Why 1? Same question as above and for the other tests.


PS14, Line 847: >
If we use >= 0 here then we can delete lines 857-859.


PS14, Line 880: 11
Why 11? Shouldn't it be post_commit opid index from line 828 plus 5 for # of 
unsafe config changes plus 1 for the new election no-op?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 14:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

Line 93:   optional bool unsafe_config_change = 4;
> do we need to give this a default of false?
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 616: if (!new_config.unsafe_config_change()) {
> I think we need a default for this field or we first need to check has_unsa
added default, yeah agreed.


PS14, Line 1612: Peer uuid $0 is not found on original
> Peer with UUID $0 is not in the committed
Done


PS14, Line 1613: the node
> this replica
Done


Line 1617: if (!IsRaftConfigVoter(peer_uuid, committed_config)) {
> I don't think this voter check is necessary. What we should do here is chec
Done, moved that check up in the API since the VOTER type can be deciphered 
much earlier looking at committed config.


PS14, Line 1653: This ensures that current node replicates the new config 
proposed,
> This makes this request appear to come from a new leader that the local rep
Done


PS14, Line 1654: it would step down before replicating
> this will cause it to step down
Done


PS14, Line 1654: node
> local replica
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS14, Line 257: FLAGS_fault_crash_before_append_commit
> Don't reuse this unrelated fault flag for this test. That flag is for the W
Done


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
> Intentional?
yeah, because we wanted to grab TimeManager::SerialTimeStamp as per David's 
last comment, and he also suggested friending would be better than making that 
routine public.


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS14, Line 523: 1
> Magic number. Why 1? Also, what if the election is churny? Since we know a 
I took this suggestion and even found that is unreliable. i.e there is a window 
between election result(which finds the leader_ts) and leader committing the 
last opid. So querying GetLastOpId may not help. Besides 
WaitUntilCommittedOpIdIndexIs() internally invokes GetLastOpId to compare with 
the value we pass in as argument, so calling GetLastOpId before that is 
redundant. What we essentially need here was to 
WaitUntilAtleastCommittedOpId(1,...) so that we know at least the replica 
committed the opid we expect (still doesn't guarantee it is the leader though).


PS14, Line 786: 8 
> 9
Done


PS14, Line 828: 1
> Why 1? Same question as above and for the other tests.
I took this logic from an existing test in raft_consensus, so there may be some 
room for improvement here in the event of election churn, but I believe we can 
expect the leader to be stable in the absence of any external events in these 
itests ?


PS14, Line 847: >
> If we use >= 0 here then we can delete lines 857-859.
indeed :), good find, done.


PS14, Line 880: 11
> Why 11? Shouldn't it be post_commit opid index from line 828 plus 5 for # o
no, this includes commit of config changes driven by re-replications from 
master, so 4 new nodes get added bumping it to 11. But, I have removed this 
WaitUntil..() from all the tests at this stage since I am just counting the 
config replication # on a new node added by re-replication now at this stage. 
i.e. Wait at line 881 alone is sufficient.


http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

PS14, Line 431: Enforce raft config on the tablet server with a new config
> Using the word enforce in this context sounds strange to me. How about:
Done, filled the latter under ExtraDescription.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#15).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 nodes.
- Unsafe config change on a 5-replica config with 2 nodes in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present on the node and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
17 files changed, 1,226 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6066/15/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1605:   if (!IsRaftConfigVoter(local_peer_uuid, committed_config)) {
How about we just move this down to line 1649 and say:

if (!IsRaftConfigVoter(state_->GetPeerUuid(), new_config)) { return 
Status::InvalidArgument(...); }


Line 1633:   if (!ContainsKey(retained_peer_uuids, local_peer_uuid)) {
This seems redundant with the voter check if we do the voter check on the new 
config as suggested above.


PS15, Line 1646: continue;
unnecessary


http://gerrit.cloudera.org:8080/#/c/6066/15/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS15, Line 171: LOG(INFO)
How about VLOG(1) here and below?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
> yeah, because we wanted to grab TimeManager::SerialTimeStamp as per David's
Didn't he suggest providing a public method to do that? Why are we calling a 
private method?


http://gerrit.cloudera.org:8080/#/c/6066/15/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 920: 
Let's keep the master dead until here, then run

  ASSERT_OK(WaitUntilCommittedConfigNumVotersIs(1, leader, tablet_id_, 
kTimeout));

Then restart the master so it can re-replicate for the rest of the recovery 
validation of this test.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread David Ribeiro Alves (Code Review)

David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
> Didn't he suggest providing a public method to do that? Why are we calling 
I suggested that friending or making it public were options and that likely 
friending would be better, since that method was private in the first place for 
a reason. (it's unlocked, AssignTimestamp has more checks). I don't like it 
very much that the whole of raft_cosnsensus class has access to this though. 
Maybe we could have a friend method to reduce access or at least make it more 
explicit?


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

Line 132:   friend class RaftConsensus;
> I suggested that friending or making it public were options and that likely
I see. OK, thanks David, in that case I think it's fine to commit as-is make 
that a TODO item for a follow-up patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#16).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica replace_config'.

This tool lets us replace a 3-node config on a tablet server with a
1-node config. This is particularly useful when we have 2 out of 3
replicas down and we want to bring the tablet back to operational state.
We can use this tool to force a new config on the surviving node providing
all the details of the new config from the tool. As a result
of the forced config change, the automatic leader election kicks in via
raft mechanisms and the re-replication is triggered from master to bring
the replica count back upto 3-node config.

The lonely survivor being operated by the tool tends to become the leader in
new config in majority of the situations because:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term. The surviving node accepts the request and holds
   a pre-election upon recognizing the leader heartbeat failure.
   This results in the surviving node electing himself as the leader
   and once the new config update is reported to master, master can
   re-replicate the tablet to other healthy nodes in the cluster
   bringing the tablet back to online state.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   hence the leader elected in step a) will still be the leader when we see
   the replication factor restored back to 3.

Also, the ReplaceConfig() API adds a flag to bypass the
'allow-max-one-pending-config-change' rule to append another change_config
op while there is one pending on the log.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 nodes.
- Unsafe config change on a 5-replica config with 2 nodes in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present on the node and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
17 files changed, 1,210 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 15:

(5 comments)

TFTR Mike/David again. We could follow up TimeManager friend method suggestion 
in an upcoming patch.

http://gerrit.cloudera.org:8080/#/c/6066/15/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1605:   if (!IsRaftConfigVoter(local_peer_uuid, committed_config)) {
> How about we just move this down to line 1649 and say:
Done


Line 1633:   if (!ContainsKey(retained_peer_uuids, local_peer_uuid)) {
> This seems redundant with the voter check if we do the voter check on the n
Done


PS15, Line 1646: continue;
> unnecessary
Done


http://gerrit.cloudera.org:8080/#/c/6066/15/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS15, Line 171: LOG(INFO)
> How about VLOG(1) here and below?
thanks for catching, some leftover debugging.


Line 920: 
> Let's keep the master dead until here, then run
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 16:

(34 comments)

I did one quick nit pass for everything except tests. I caught a couple minor 
issues, but this rev is mostly asking for changes to comments and log messages. 
Almost there.

http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 494:   // The raft config sent to destination server.
Add: Only the 'permanent_uuid' of each peer in the config is required 
(address-related information is ignored by the server). The peers specified in 
'new_config' are required to be a subset (or equal to) the peers in the 
committed config on the destination replica.


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS16, Line 553: :
nit: add space after colon


PS16, Line 554: :
and here


PS16, Line 777: :
nit: space after colon here and elsewhere in this log message


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1167: :
space after colon here and in the rest of this commit message


PS16, Line 1330:  
nit: Add a period before this space.


PS16, Line 1594: Node
Replica


PS16, Line 1594: so new config will be appended.
but the new config will be unsafely changed anyway.


PS16, Line 1595:  :
": "


Line 1596: << state_->GetPendingConfigUnlocked().DebugString();
Add:
<< ", New config: " << SecureShortDebugString(req.new_config());


PS16, Line 1596: ebugString
use SecureShortDebugString() instead


PS16, Line 1605:  and that node is a VOTER in the config
remove this


PS16, Line 1606: This is to prevent the API allowing an invalid uuid into the 
new config.
Consider changing this to: This allows a manual recovery tool to only have to 
specify the uuid of each replica in the new config without having to know the 
addresses of each server (since we can get the address information from the 
committed config). Additionally, only a subset of the committed config is 
required for typical cluster repair scenarios.


PS16, Line 1616: for tablet $1
for tablet $1. Committed config: $2

...and include the committed config in this message using ShortDebugString()


PS16, Line 1630: we don't want to enforce unsafe config that way
Let's replace this part of the comment with: "it is rare and a replica without 
itself in the latest config is definitely not caught up with the latest 
leader's log."


PS16, Line 1633: committed
new


PS16, Line 1635: tablet $1
tablet $1. Rejected config: $2

and pass in SecureShortDebugString(new_config)


PS16, Line 1639: preceding_opid.index() + 1
Define replicate_opid_index up here and use it here as well as down below.


PS16, Line 1660: leader_term
let's rename this variable 'new_term'


PS16, Line 1680: DVLOG(1) << "Consensus request: "
VLOG_WITH_PREFIX(3) << "UnsafeChangeConfig: Generated consensus request: "


Line 1681:   DVLOG(1) << "Replicate Msg: " << 
SecureShortDebugString(*replicate);
Remove this line since it's redundant with the line above


PS16, Line 1684: ,
nit: add space after comma


PS16, Line 1685: DebugString()
SecureShortDebugString() here and below


PS16, Line 1685:  :
space after colon, not before


PS16, Line 1686:  :
here also: space after colon, not before


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS16, Line 241: (
nit: Remove extra pair of parentheses here


PS16, Line 245:   << Substitute("New committed config must equal pending 
config, but does not. "
  :   "Pending config: $0, committed config: $1",
  :   SecureShortDebugString(pending_config),
  :   SecureShortDebugString(config_to_commit));
nit: it seems like the indentation is a little messed up on these 4 lines now. 
Can you re-indent it and fix it to look like it did before?


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 45: #include "kudu/tools/tool_test_util.h"
Is this used? It looks like a test utility library?


PS16, Line 366:   LOG(WARNING) << "NOTE: the new config may be replicated 
asynchronously "
  :<< "to other peers and it may take a while to 
bring the tablet "
  :<< "back to majority replica count depending 
upon how much time "
  :<< "the new peer replicas take to catch up with 
the tablet server "
  :<< "enforcing the new config.";
I think we should remove this message. I think it is confusing because it talks 
about replication when this tool is

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 16:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 494:   // The raft config sent to destination server.
> Add: Only the 'permanent_uuid' of each peer in the config is required (addr
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS16, Line 553: :
> nit: add space after colon
Done


PS16, Line 554: :
> and here
Done


PS16, Line 777: :
> nit: space after colon here and elsewhere in this log message
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1167: :
> space after colon here and in the rest of this commit message
Done


PS16, Line 1330:  
> nit: Add a period before this space.
Done


PS16, Line 1594: Node
> Replica
Done


PS16, Line 1594: so new config will be appended.
> but the new config will be unsafely changed anyway.
Done


PS16, Line 1595:  :
> ": "
Done


Line 1596: << state_->GetPendingConfigUnlocked().DebugString();
> Add:
It may not be a good idea to display new config here especially when we know 
that what the tool passed in has just the list of uuids with default values for 
RaftConfigPB fields. new_config is going to be built later on looking at 
committed_config and we are displaying that at L1686, so its redundant too here.


PS16, Line 1596: ebugString
> use SecureShortDebugString() instead
Done


PS16, Line 1605:  and that node is a VOTER in the config
> remove this
Done


PS16, Line 1606: This is to prevent the API allowing an invalid uuid into the 
new config.
> Consider changing this to: This allows a manual recovery tool to only have 
Done


PS16, Line 1616: for tablet $1
> for tablet $1. Committed config: $2
Done


PS16, Line 1630: we don't want to enforce unsafe config that way
> Let's replace this part of the comment with: "it is rare and a replica with
Done


PS16, Line 1633: committed
> new
ok I will go with this, but the reason I was apprehensive of using 
new/committed is, VOTER/NON_VOTER field is read from committed config and user 
doesn't have a say in that. Hence the error message quoting "not a VOTER in new 
config" may be misleading.


PS16, Line 1635: tablet $1
> tablet $1. Rejected config: $2
Done


PS16, Line 1639: preceding_opid.index() + 1
> Define replicate_opid_index up here and use it here as well as down below.
Done


PS16, Line 1660: leader_term
> let's rename this variable 'new_term'
Done


PS16, Line 1680: DVLOG(1) << "Consensus request: "
> VLOG_WITH_PREFIX(3) << "UnsafeChangeConfig: Generated consensus request: "
Done


Line 1681:   DVLOG(1) << "Replicate Msg: " << 
SecureShortDebugString(*replicate);
> Remove this line since it's redundant with the line above
Done


PS16, Line 1684: ,
> nit: add space after comma
Done


PS16, Line 1685: DebugString()
> SecureShortDebugString() here and below
Done


PS16, Line 1685:  :
> space after colon, not before
Done


PS16, Line 1686:  :
> here also: space after colon, not before
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS16, Line 241: (
> nit: Remove extra pair of parentheses here
Done


PS16, Line 245:   << Substitute("New committed config must equal pending 
config, but does not. "
  :   "Pending config: $0, committed config: $1",
  :   SecureShortDebugString(pending_config),
  :   SecureShortDebugString(config_to_commit));
> nit: it seems like the indentation is a little messed up on these 4 lines n
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 45: #include "kudu/tools/tool_test_util.h"
> Is this used? It looks like a test utility library?
yeah, actually I was initially using GetKuduCtlAbsolutePath("/bin/kudu") 
instead of random "kudu-tools" as caller id. I updated the code at L381 now.


PS16, Line 366:   LOG(WARNING) << "NOTE: the new config may be replicated 
asynchronously "
  :<< "to other peers and it may take a while to 
bring the tablet "
  :<< "back to majority replica count depending 
upon how much time "
  :<< "the new peer replicas take to catch up with 
the tablet server "
  :<< "enforcing the new config.";
> I think we should remove this message. I think it is confusing because it t
My concern about removing this message is that, user wouldn't know about these 
operations happening in the background and think that th

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#17).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica unsafe_change_config'.

This tool lets us replace a N-replica config on a tablet server with a
new config containing N or less replicas. This is particularly useful
when we have majority of the replicas down and for some reason we are not
able to bring the tablet back online using other recovery tools like
'kudu remote_replica copy'. We can use this tool to force a new config on the
surviving replica providing all the replica uuids of the new config from
the cli tool. As a result of the forced config change, the automatic leader
election kicks in via raft mechanisms and the re-replication is triggered
from master (if needed due to under-replicated tablet) to bring the replica
count of the tablet back upto N.

How does the tool bring tablet back online with new config:
a) The tool acts as a fake leader and generates the consensus update with
   a bumped up term along with the new config. The surviving node (leader or
   follower) accepts the request and replicates the request and goes through
   a pre-election phase in which a leader is elected among the nodes provided
   in the config. If the new config provides enough VOTERs to win en election,
   the leader election succeeds and the new config will be committed.
   Master can eventually recognize this consensus state change and make sure
   tablet is re-replicated back to healthy count if it finds the tablet
   under-replicated.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   so master will very likely choose the new healthy live servers for
   re-replication if needed. If the dead nodes come back after master is
   updated with new unsafely forced config, master will delete the replicas
   on those dead nodes via DeleteTablet RPC because they are no longer part
   of the tablet config.

Also, the UnasfeChangeConfig() API adds a flag to append another change_config
op while there is one pending config in the log. This flag lifts the safety
net around pending configs which states that there can be only one pending
config at the max for a given replica.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect. The future revs of this patch should weaken this assumption and build
more safety guarantees around situations dead nodes coming back during the
unsafe change config operations on the cluster.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 replicas.
- Unsafe config change on a 5-replica config with 2 replicas in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present with the replica and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
17 files changed, 1,222 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-Patc

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-23 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 16:

(2 comments)

I also massaged COMMIT_MSG to a good extent and couple of minor modifications 
below.

http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1635: tablet $1
> tablet $1. Rejected config: $2
It turned out there is no member SecureShortDebugString() for RaftConfigPB so 
used ShortDebugString()


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 45: #include "kudu/tools/tool_test_util.h"
> yeah, actually I was initially using GetKuduCtlAbsolutePath("/bin/kudu") in
On second thoughts, using "kudu-tools" was more readable in logs than 
"/bin/kudu", so removed this header inclusion and sticking to "kudu-tools" as 
caller id.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1635: tablet $1
> It turned out there is no member SecureShortDebugString() for RaftConfigPB 
Use SecureShortDebugString() from pb_util.h


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 17:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6066/17//COMMIT_MSG
Commit Message:

PS17, Line 24: fake
"fake"


PS17, Line 40: UnasfeChangeConfig
UnsafeChangeConfig


http://gerrit.cloudera.org:8080/#/c/6066/17/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS17, Line 778: .
Nit: please also add a space after this period (and all periods)


PS17, Line 780: ,
please add a space after this comma (and all commas)


http://gerrit.cloudera.org:8080/#/c/6066/17/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1597: << state_->GetPendingConfigUnlocked().ShortDebugString();
Use SecureShortDebugString() from pb_util.h:

<< SecureShortDebugString(state_->GetPendingConfigUnlocked());


PS17, Line 1694: committed_config.ShortDebugString()
SecureShortDebugString(committed_config)


PS17, Line 1695: new_config.ShortDebugString()
SecureShortDebugString(new_config)


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

PS16, Line 366:<< "to other peers and it may take a while to 
bring the tablet "
  :<< "back to majority replica count depending 
upon how much time "
  :<< "the new peer replicas take to catch up with 
the tablet server "
  :<< "enforcing the new config.";
  : 
> My concern about removing this message is that, user wouldn't know about th
It's still there in rev 17.

Personally I think if they are using this tool they should know what they're 
doing and I don't think this warning provides enough context. This kind of info 
should go into docs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Dinesh Bhat (Code Review)

Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 16:

(9 comments)

TFTR Mike again, updated now.

http://gerrit.cloudera.org:8080/#/c/6066/17//COMMIT_MSG
Commit Message:

PS17, Line 24: he s
> "fake"
Done


PS17, Line 40: 
> UnsafeChangeConfig
Done


http://gerrit.cloudera.org:8080/#/c/6066/17/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS17, Line 778: .
> Nit: please also add a space after this period (and all periods)
Done


PS17, Line 780: ,
> please add a space after this comma (and all commas)
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS16, Line 1635: tablet $1
> Use SecureShortDebugString() from pb_util.h
Done


http://gerrit.cloudera.org:8080/#/c/6066/17/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1597: }
> Use SecureShortDebugString() from pb_util.h:
Done


PS17, Line 1694: 
> SecureShortDebugString(committed_config)
Done


PS17, Line 1695: 
> SecureShortDebugString(new_config)
Done


http://gerrit.cloudera.org:8080/#/c/6066/16/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

PS16, Line 366:   LOG(WARNING) << "NOTE: the new config may be replicated 
asynchronously "
  :<< "to other peers and it may take a while to 
bring the tablet "
  :<< "back to majority replica count depending 
upon how much time "
  :<< "the new peer replicas take to catch up with 
the tablet server "
  :<< "enforcing the new config.";
> It's still there in rev 17.
Done, removed now, yeah looks like I pressed 'done' before updating this code :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Dinesh Bhat (Code Review)

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6066

to look at the new patch set (#18).

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..

KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica unsafe_change_config'.

This tool lets us replace a N-replica config on a tablet server with a
new config containing N or less replicas. This is particularly useful
when we have majority of the replicas down and for some reason we are not
able to bring the tablet back online using other recovery tools like
'kudu remote_replica copy'. We can use this tool to force a new config on the
surviving replica providing all the replica uuids of the new config from
the cli tool. As a result of the forced config change, the automatic leader
election kicks in via raft mechanisms and the re-replication is triggered
from master (if needed due to under-replicated tablet) to bring the replica
count of the tablet back upto N.

How does the tool bring tablet back online with new config:
a) The tool acts as a 'fake' leader and generates the consensus update with
   a bumped up term along with the new config. The surviving node (leader or
   follower) accepts the request and replicates the request and goes through
   a pre-election phase in which a leader is elected among the nodes provided
   in the config. If the new config provides enough VOTERs to win an election,
   the leader election succeeds and the new config will be committed.
   Master can eventually recognize this consensus state change and make sure
   tablet is re-replicated back to healthy count if it finds the tablet
   under-replicated.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   so master will very likely choose the new healthy live servers for
   re-replication if needed. If the dead nodes come back after master is
   updated with new unsafely forced config, master will delete the replicas
   on those dead nodes via DeleteTablet RPC because they are no longer part
   of the tablet config.

Also, the UnsafeChangeConfig() API adds a flag to append another change_config
op while there is one pending config in the log. This flag lifts the safety
net around pending configs which states that there can be only one pending
config at the max for a given replica.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect. The future revs of this patch should weaken this assumption and build
more safety guarantees around situations dead nodes coming back during the
unsafe change config operations on the cluster.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 replicas.
- Unsafe config change on a 5-replica config with 2 replicas in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present with the replica and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
17 files changed, 1,216 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/6066/18
-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-Pa

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Mike Percy (Code Review)

Mike Percy has posted comments on this change.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


Patch Set 19: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No

[kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-24 Thread Mike Percy (Code Review)

Mike Percy has submitted this change and it was merged.

Change subject: KUDU-1330: Add a tool to unsafely recover from loss of majority 
replicas
..


KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

This patch adds an API to allow unsafe config change via an external
recovery tool 'kudu remote_replica unsafe_change_config'.

This tool lets us replace a N-replica config on a tablet server with a
new config containing N or less replicas. This is particularly useful
when we have majority of the replicas down and for some reason we are not
able to bring the tablet back online using other recovery tools like
'kudu remote_replica copy'. We can use this tool to force a new config on the
surviving replica providing all the replica uuids of the new config from
the cli tool. As a result of the forced config change, the automatic leader
election kicks in via raft mechanisms and the re-replication is triggered
from master (if needed due to under-replicated tablet) to bring the replica
count of the tablet back upto N.

How does the tool bring tablet back online with new config:
a) The tool acts as a 'fake' leader and generates the consensus update with
   a bumped up term along with the new config. The surviving node (leader or
   follower) accepts the request and replicates the request and goes through
   a pre-election phase in which a leader is elected among the nodes provided
   in the config. If the new config provides enough VOTERs to win an election,
   the leader election succeeds and the new config will be committed.
   Master can eventually recognize this consensus state change and make sure
   tablet is re-replicated back to healthy count if it finds the tablet
   under-replicated.
b) Assumption is that, the dead nodes are not coming back during this recovery,
   so master will very likely choose the new healthy live servers for
   re-replication if needed. If the dead nodes come back after master is
   updated with new unsafely forced config, master will delete the replicas
   on those dead nodes via DeleteTablet RPC because they are no longer part
   of the tablet config.

Also, the UnsafeChangeConfig() API adds a flag to append another change_config
op while there is one pending config in the log. This flag lifts the safety
net around pending configs which states that there can be only one pending
config at the max for a given replica.

This patch is a first in series for unsafe config changes, and assumes that
the dead servers are not coming back while the new config change is taking
effect. The future revs of this patch should weaken this assumption and build
more safety guarantees around situations dead nodes coming back during the
unsafe change config operations on the cluster.

Tests associated with this patch:
- Unsafe config change when there is one follower survivor in the cluster.
- Unsafe config change when there is one leader survivor in the cluster.
- Unsafe config change when the unsafe config contains 2 replicas.
- Unsafe config change on a 5-replica config with 2 replicas in the new config.
- Unsafe config change when there is a pending config on the surviving leader.
- Unsafe config change when there is a pending config on a surviving follower.
- Unsafe config change when there are back to back pending configs on WAL,
  and verify that tablet bootstraps fine.
- Test back to back unsafe config changes when there are multiple pending
  configs present with the replica and the one with 'sane' new config will
  bring the tablet back to online state.

TODO:
1) Test exercising all the error cases in the UnsafeChangeConfig API.
2) Test the UnsafeChangeConfig RPC directly without going via external tool.

Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Reviewed-on: http://gerrit.cloudera.org:8080/6066
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
17 files changed, 1,216 insertions(+), 111 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
Gerrit-PatchSet: 20
Gerrit-Project

Re: [kudu-CR] KUDU-1330: Add a tool to unsafely recover from loss of majority replicas

2017-03-11 Thread Dinesh Bhat

We can ignore this build failure, linux wasn’t happy(OS X was) due to absence 
of namespace std:: prefix for ‘string' in an include file.
I re-uploaded the patch.

> On Mar 11, 2017, at 2:54 AM, Kudu Jenkins (Code Review)  
> wrote:
> 
> Kudu Jenkins has posted comments on this change.
> 
> Change subject: KUDU-1330: Add a tool to unsafely recover from loss of 
> majority replicas
> ..
> 
> 
> Patch Set 9: Verified-1
> 
> Build Failed 
> 
> http://104.196.14.100/job/kudu-gerrit/6977/ : FAILURE
> 
> -- 
> To view, visit http://gerrit.cloudera.org:8080/6066
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
> 
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I908d8c981df74d56dbd034e72001d379fb314700
> Gerrit-PatchSet: 9
> Gerrit-Project: kudu
> Gerrit-Branch: master
> Gerrit-Owner: Dinesh Bhat 
> Gerrit-Reviewer: David Ribeiro Alves 
> Gerrit-Reviewer: Dinesh Bhat 
> Gerrit-Reviewer: Kudu Jenkins
> Gerrit-Reviewer: Mike Percy 
> Gerrit-Reviewer: Tidy Bot
> Gerrit-HasComments: No

53 matches

Mail list logo