[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Dinesh Bhat has posted comments on this change. Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG Commit Message: PS3, Line 7: evcited > haven't looked at this code yet, but I can't tell from the commit message w Todd, yeah this fix aimed to evict the replicas returning TABLET_NOT_RUNNING. I had looked at your comments on the JIRA before attempting a fix on this. I may have misunderstood the bug little bit here. My understanding was that tablet server would return TABLET_NOT_RUNNING error once after tablet has reached a "steady" state of FAILED or some other error state. It appears like we return TABLET_NOT_RUNNING when tablet is anything other than RUNNING, so bootstrapping would fall under this too. We definitely don't want to evict when tablet is bootstrapping (until we hit a timeout ? ). As a solution, a) would it be better to introduce another response code here ? something like TABLET_BOOTSTRAPPING which is an indication for consensus to not to evict such replicas from config until some timeout, and treat TABLET_NOT_RUNNING as a fatal error ? b) Or else we could treat TABLET_NOT_RUNNING as a transient error code and after 300 secs (sufficient window to copy large tablets given the improvements in upcoming tablet copy workflows ?) we could evict the replica from config. -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Todd Lipcon has posted comments on this change. Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG Commit Message: PS3, Line 7: evcited > evicted haven't looked at this code yet, but I can't tell from the commit message what the effect of this change is. What's the change being made? You're fixing a bug such that now tablets _will_ be evicted for TABLET_NOT_RUNNING? Or making it so that they won't be? I think not evicting is the correct behavior for the case when a tablet is in BOOTSTRAPPING state. Otherwise restarting a node (as in rolling restart) would cause all of its replicas to be evicted and cause big issues. -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Adar Dembo has posted comments on this change. Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. Patch Set 3: (4 comments) Again, will defer to Mike and David. http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/consensus/consensus_queue-test.cc File src/kudu/consensus/consensus_queue-test.cc: Line 818: TEST_F(ConsensusQueueTest, TestTriggerTabletCopyIfTabletNotRunning) { This looks like it's exactly the same test as the one below, but with a different code. Could you consolidate all of the shared code? If this means running both "tests" in a single TEST_F(), that's fine too. http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: Line 2772: // A new good copy should get created because follower returns Ah, this addresses one of my comments from your earlier patch. It looks like this hunk belonged in that patch. Can you move it? http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 574: Status TSTabletManager::StartTabletStateTransitionUnlocked( These changes affect the semantics of DeleteTablet() and StartTabletCopy() somewhat, right? AFAICT it's now possible for DeleteTablet() to return ALREADY_INPROGRESS, and for StartTabletCopy() to return TABLET_NOT_RUNNING. I don't think that's necessarily wrong, I just want to make sure you've thought through the side effects of that. How will the RPC callers of DeleteTablet() and StartTabletCopy() react to those errors? http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: PS3, Line 207: replica Nit: the rest of this comment refers to "tablet" when describing a "tablet replica", so please do the same for the new section you added. -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Mike Percy has posted comments on this change. Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. Patch Set 3: (3 comments) The JIRA (KUDU-1407) talks about evicting a failed replica, but here you are tablet copying a failed replica, so having some discussion about the design of this in the commit message would help. http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG Commit Message: PS3, Line 7: evcited evicted Line 9: Fixes replica eviction failure where a follower returning Please provide a detailed description of how you are fixing this problem and what the different cases are http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/tablet_server_test_util.cc File src/kudu/tserver/tablet_server_test_util.cc: Line 34: const char* TabletServerTestBase::kTableId = "TestTable"; Shouldn't we put this in a new file tablet_server-test-base.cc ? -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Hello Mike Percy, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5352 to look at the new patch set (#3). Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING Fixes replica eviction failure where a follower returning TABLET_NOT_RUNNING is not evicted from the consensus config. Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 97 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/3 -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Hello David Ribeiro Alves, Mike Percy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5352 to look at the new patch set (#2). Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING Fixes replica eviction failure where a follower returning TABLET_NOT_RUNNING is not evicted from the consensus config. Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 95 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/2 -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING
Hello David Ribeiro Alves, Mike Percy, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5352 to review the following change. Change subject: [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING .. [consensus] KUDU-1407 replica is not evcited when TABLET_NOT_RUNNING Fixes replica eviction failure where a follower returning TABLET_NOT_RUNNING is not evicted from the consensus config. Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 91 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/5352/1 -- To view, visit http://gerrit.cloudera.org:8080/5352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy