[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-07 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG
Commit Message:

PS3, Line 7: evcited
> haven't looked at this code yet, but I can't tell from the commit message w
Todd, yeah this fix aimed to evict the replicas returning TABLET_NOT_RUNNING. I 
had looked at your comments on the JIRA before attempting a fix on this. I may 
have misunderstood the bug little bit here. My understanding was that tablet 
server would return TABLET_NOT_RUNNING error once after tablet has reached a 
"steady" state of FAILED or some other error state. It appears like we return 
TABLET_NOT_RUNNING when tablet is anything other than RUNNING, so bootstrapping 
would fall under this too. We definitely don't want to evict when tablet is 
bootstrapping (until we hit a timeout ? ).

As a solution, a) would it be better to introduce another response code here ? 
something like TABLET_BOOTSTRAPPING which is an indication for consensus to not 
to evict such replicas from config until some timeout, and treat 
TABLET_NOT_RUNNING as a fatal error ? 
b) Or else we could treat TABLET_NOT_RUNNING as a transient error code and 
after 300 secs (sufficient window to copy large tablets given the improvements 
in upcoming tablet copy workflows ?) we could evict the replica from config.


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG
Commit Message:

PS3, Line 7: evcited
> evicted
haven't looked at this code yet, but I can't tell from the commit message what 
the effect of this change is.

What's the change being made? You're fixing a bug such that now tablets _will_ 
be evicted for TABLET_NOT_RUNNING? Or making it so that they won't be?

I think not evicting is the correct behavior for the case when a tablet is in 
BOOTSTRAPPING state. Otherwise restarting a node (as in rolling restart) would 
cause all of its replicas to be evicted and cause big issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(4 comments)

Again, will defer to Mike and David.

http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 818: TEST_F(ConsensusQueueTest, TestTriggerTabletCopyIfTabletNotRunning) {
This looks like it's exactly the same test as the one below, but with a 
different code. Could you consolidate all of the shared code? If this means 
running both "tests" in a single TEST_F(), that's fine too.


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2772:   // A new good copy should get created because follower returns
Ah, this addresses one of my comments from your earlier patch. It looks like 
this hunk belonged in that patch. Can you move it?


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 574: Status TSTabletManager::StartTabletStateTransitionUnlocked(
These changes affect the semantics of DeleteTablet() and StartTabletCopy() 
somewhat, right? AFAICT it's now possible for DeleteTablet() to return 
ALREADY_INPROGRESS, and for StartTabletCopy() to return TABLET_NOT_RUNNING.

I don't think that's necessarily wrong, I just want to make sure you've thought 
through the side effects of that. How will the RPC callers of DeleteTablet() 
and StartTabletCopy() react to those errors?


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS3, Line 207: replica
Nit: the rest of this comment refers to "tablet" when describing a "tablet 
replica", so please do the same for the new section you added.


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(3 comments)

The JIRA (KUDU-1407) talks about evicting a failed replica, but here you are 
tablet copying a failed replica, so having some discussion about the design of 
this in the commit message would help.

http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG
Commit Message:

PS3, Line 7: evcited
evicted


Line 9: Fixes replica eviction failure where a follower returning
Please provide a detailed description of how you are fixing this problem and 
what the different cases are


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/tablet_server_test_util.cc
File src/kudu/tserver/tablet_server_test_util.cc:

Line 34: const char* TabletServerTestBase::kTableId = "TestTable";
Shouldn't we put this in a new file tablet_server-test-base.cc ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-04 Thread Dinesh Bhat (Code Review)
Hello Mike Percy, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5352

to look at the new patch set (#3).

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..

[consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING

Fixes replica eviction failure where a follower returning
TABLET_NOT_RUNNING is not evicted from the consensus config.

Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_server-test-base.h
M src/kudu/tserver/tablet_server_test_util.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
8 files changed, 97 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-04 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Mike Percy,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5352

to look at the new patch set (#2).

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..

[consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING

Fixes replica eviction failure where a follower returning
TABLET_NOT_RUNNING is not evicted from the consensus config.

Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_server-test-base.h
M src/kudu/tserver/tablet_server_test_util.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
8 files changed, 95 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-04 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Mike Percy,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5352

to review the following change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..

[consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING

Fixes replica eviction failure where a follower returning
TABLET_NOT_RUNNING is not evicted from the consensus config.

Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_server-test-base.h
M src/kudu/tserver/tablet_server_test_util.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
8 files changed, 91 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy