Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13384 )
Change subject: hms: fix flakiness of master-stress-test from managed tables ...................................................................... Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/13384/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13384/1//COMMIT_MSG@22 PS1, Line 22: is retried by B I would think it's being retried by the client that initiated the operation, no? http://gerrit.cloudera.org:8080/#/c/13384/1/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/13384/1/src/kudu/integration-tests/master-stress-test.cc@279 PS1, Line 279: if (hms_client_ && s.IsRemoteError()) { : // Check that the table was not created in Kudu. : client::sp::shared_ptr<KuduTable> table; : s = client_->OpenTable(to_create, &table); : CHECK(s.IsNotFound()) << s.ToString(); : : // Check that the table was not created in the HMS. : Slice db; : Slice table_name; : CHECK_OK(ParseHiveTableIdentifier(to_create, &db, &table_name)); : hive::Table hms_table; : Status s = hms_client_->GetTable(db.ToString(), table_name.ToString(), &hms_table); : CHECK(s.IsNotFound()) << s.ToString(); I'm curious whether this piece might inadvertently hide issues in other scenarios. Is there anything specific in the Status::RemoteError() in this case that we could use to make sure this outcome has happened due to the leader change and nothing else? http://gerrit.cloudera.org:8080/#/c/13384/1/src/kudu/integration-tests/master-stress-test.cc@280 PS1, Line 280: // Check that the table was not created in Kudu. : client::sp::shared_ptr<KuduTable> table; : s = client_->OpenTable(to_create, &table); : CHECK(s.IsNotFound()) << s.ToString(); : : // Check that the table was not created in the HMS. : Slice db; : Slice table_name; : CHECK_OK(ParseHiveTableIdentifier(to_create, &db, &table_name)); : hive::Table hms_table; : Status s = hms_client_->GetTable(db.ToString(), table_name.ToString(), &hms_table); : CHECK(s.IsNotFound()) << s.ToString(); Another this I'm curious is whether this check catches all the outcomes of the race described in the commit message. For example, is it possible that by the time of this check the entry is still in the HMS and hasn't been rolled back by the scoped cleanup in the former master, and the table is seen as not-yet-existing in Kudu by the newly elected leader master? -- To view, visit http://gerrit.cloudera.org:8080/13384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1233e5b133035d36f98ce08265c0273570caa454 Gerrit-Change-Number: 13384 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 21 May 2019 06:13:57 +0000 Gerrit-HasComments: Yes