Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19571 )
Change subject: KUDU-3452 Create tablet without enough healthy tservers ...................................................................... Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/master/catalog_manager.cc@418 PS10, Line 418: enough > nit: not enough Yingchun, to me it looks correct if you read the whole sentence: ... enough healthy tablet servers ... to place just the majority of tablet replicas http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/master/catalog_manager.cc@6663 PS10, Line 6663: larger nit: greater http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9196 PS10, Line 9196: const nit: this could be constexpr as well http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9245 PS10, Line 9245: SleepFor(MonoDelta::FromMilliseconds(6000)); > It would be better to check the 3 tservers are actually shutdown successfll IIUC, ExternalDaemon::Shutdown() actually awaits for the process to be stopped/exited, so I'm not sure I understand why it's necessary to wait here. Probably, the idea is to wait until catalog manager (i.e. master) sees just 2 tablet servers alive instead of 5, as Yingchun mentioned? If so, you can wait for some state of the mini-cluster (or a genetic condition) using ASSERT_EVENTUALLY(). http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9247 PS10, Line 9247: no nit: not http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9267 PS10, Line 9267: Restart 2 For a tablet with RF=5, for a positive scenario we are interested to see if the system allows to create a table when only 3 healthy tablet servers are present. Here we have 4 instead of 3 if stating back 2 tablet servers. http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9272 PS10, Line 9272: SleepFor(MonoDelta::FromMilliseconds(4000)); > Same, check the 4 tservers are actually alive. +1 http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9296 PS10, Line 9296: alter table Altering the replication factor is not a 'regular' sort of ALTER TABLE. Could you also add a 'regular' ALTER TABLE operation, e.g., adding a column for a table that's currently under-replicated? Does it fail (timeout, etc.) or succeed? Thanks! http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9299 PS10, Line 9299: will nit: will be Also, what table? At this point there are at least two tables. So, is kOneReplicaTableName table expected to be under-replicated? http://gerrit.cloudera.org:8080/#/c/19571/10/src/kudu/tools/kudu-tool-test.cc@9325 PS10, Line 9325: out.clear(); nit: it's much cleaner to use variables local to the scope instead of relying on the out-of-the scope variables -- consider declaring a new 'out' variable local to this scope -- To view, visit http://gerrit.cloudera.org:8080/19571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I742ba1ff770f5c8b1be5800334c29bec96e195c6 Gerrit-Change-Number: 19571 Gerrit-PatchSet: 10 Gerrit-Owner: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: KeDeng <kdeng...@gmail.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Yingchun Lai <laiyingc...@apache.org> Gerrit-Reviewer: Yuqi Du <shenxingwuy...@gmail.com> Gerrit-Comment-Date: Mon, 24 Apr 2023 22:18:08 +0000 Gerrit-HasComments: Yes