Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states ......................................................................
Patch Set 26: (10 comments) Note: a significant portion of this revision and those prior have been moved to a separate patch. https://gerrit.cloudera.org/#/c/7784/ http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: Line 36 > How come we don't need this? Lo and behold, we do! Surprised this wasn't caught by iwyu. http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS26, Line 540: vector<PathInstanceMetadataFile*> instances; : instances.resize(path_set.all_paths_size()); > I think there's a constructor that resizes. Done PS26, Line 587: path_set.all_paths(uuid_idx) > Capture this in a local cref? Done PS26, Line 605: path_set_.Swap(&path_set); : auto path_set_reset = MakeScopedCleanup([&] { : path_set_.Swap(&path_set); : }); > This is strange. Can't you parameterize the path set for UpdateInstanceFile Done Line 662: string healthy_path = path_set_.all_paths(uuid_idx).path(); > Not used? Done Line 670: set<int> indices_to_update(std::move(updated_uuid_indices)); > Why not just operate on updated_uuid_indices directly? It was passed by val Done PS26, Line 689: indices_to_update.erase(idx_to_update); > Seems like you could do this right at .begin() and avoid duplicating it. Done Line 909: "opened in read-only mode"; > Nit: funky indentation here. Done Line 917: CHECK_NE(failed_data_dirs_.size(), data_dirs_.size()) << "All data dirs have failed"; > Also surprised to see this here; why not let the higher levels (maintenance The behavior would be the same throughout: if all of our disks have failed, there's not much we can do. http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS26, Line 2183: // Note: disk failures when opening the block manager need only mark the : // directory as failed, rather than running callbacks to shut down tablets. At : // this point, the tablet manager is not initialized, and no such callback : // exists. > This is pretty fragile; what if Repair() is refactored and some of the logi Done -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes