Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 )
Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync ...................................................................... Patch Set 7: (5 comments) The current solution looks good to me. Could you also add test cases for the LocalCatalog mode? You can use anotation like this to enable it: @CustomClusterTestSuite.with_args( impalad_args="--use_local_catalog=true", catalogd_args="--catalog_topic_mode=minimal") http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@a2142 PS7, Line 2142: Can we keep this? If we wait for the next statestore update, this won't cause trouble now. In my understanding, this is the only difference we can use to distinguish DDL updates from statestore updates. http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2144 PS7, Line 2144: while (cur_service_id == catalog_update_info_.catalog_service_id) { Could you add a log before the while-loop? E.g. Catalog service id mismatch. Current id: xxx. Id in response: yyy. Catalogd may be restarted. Waiting for new catalog update from statestore. http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2150 PS7, Line 2150: if (cur_service_id == catalog_service_id) { I think this captures the case that catalogd restart again and we get another catalog_service_id from statestore update. Could you add a warning log in the else-clause? E.g. Ignoring catalog update result of catalog service id: xxx. The previous catalog service id is yyy. Current catalog service id is zzz. Catalogd may be restarted more than once. http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py@189 PS4, Line 189: self.execute_query_expect_success(self.client, "select name from join_aa") : self.exec > I prefer Solution 2, this kind of modification is smaller and safer Yeah, agree on these. http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py@169 PS7, Line 169: def test_restart_catalogd(self): This is already a good test case. Can we add another one to cover the case that catalogd is restarted twice? -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao <liu...@sensorsdata.cn> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: liuyao <liu...@sensorsdata.cn> Gerrit-Comment-Date: Thu, 22 Jul 2021 00:54:20 +0000 Gerrit-HasComments: Yes