Mike Percy has posted comments on this change.

Change subject: disk failure: don't open tablets on failed disks
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7766/3//COMMIT_MSG
Commit Message:

PS3, Line 21: Testing is done by loading data into a cluster with multi-disk
            : servers, failing a single directory of one of the servers, and 
ensuring
            : that the tablets spread across the failed disk get replicated 
upon the
            : next startup.
how about: Testing is done by loading data into a cluster configured to use 
multiple directories for data blocks, failing a single directory on one of the 
tablet servers, and ensuring that the tablets with blocks on the failed 
directory get re-replicated at startup time.


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 1702:       return Status::OK();
I'm not sure why we are returning OK here. Also, new API semantics should be 
documented at the interface level.


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

PS3, Line 43: TabletServerIntegrationTestBase
Would you mind inheriting from ExternalMiniClusterITestBase instead in this 
class? The newer tests are inheriting from that instead.


PS3, Line 96: server
a tablet server


PS3, Line 97: server
tablet server


PS3, Line 98: .
while it is shut down.


Line 109:   write_workload.Setup();
This creates a table. Why are you creating the table yourself above?


Line 110:   write_workload.Start();
You should call workload.stopAndJoin() at some point during the test to shut 
the writer thread down again. Did you want it running this whole time?


PS3, Line 114: WaitForTSAndReplicas
what is the purpose of calling this function?


PS3, Line 124:   NO_FATALS(SetServerSurvivalFlags(ext_tservers));
> why is this not set on boot?
agree


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 765:     LOG(ERROR) << "Exiting bootstrapping early; tablet is in a failed 
directory";
how about: LOG(ERROR) << LogPrefix(tablet_id) << "aborting tablet bootstrap: 
tablet has data in a failed directory";


-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to