Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20276 )
Change subject: IMPALA-12321: Fix the race condition for updating active catalogd ...................................................................... Patch Set 2: (14 comments) http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@10 PS1, Line 10: on two different events: : catalogd registration, and new a > nit: "on two different events: catalogd registration, and a new active cata Done http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@12 PS1, Line 12: testored sends the active c > nit: what is the difference? the value of 'for_registration' parameter? One is StatestoreService.RegisterSubscriber from subscriber to statestore, another one is StatestoreSubscriber.UpdateCatalogd from statestore to subscriber. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@163 PS1, Line 163: catalog_objects_to/from_version_ > nit: is this refer to catalog_objects_max_version_? yes http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@171 PS1, Line 171: bool last_update_for_registration_ = false; > This is assigned in CatalogServer::UpdateActiveCatalogd, but never used any Should be used in CatalogServer::UpdateActiveCatalogd(). Fixed. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@499 PS1, Line 499: is_registration_ > nit: I think 'is_registration_reply' is more appropriate. Done http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@523 PS1, Line 523: last_update_for_registration_ = false; > nit: can move this after line 510 and delete the duplicate in line 520. last_update_for_registration_ should be used in in line #512. Fixed. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h File be/src/runtime/exec-env.h: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h@330 PS1, Line 330: /// True if the last update of active catal > This is assigned in ExecEnv::UpdateActiveCatalogd, but never used anywhere Thanks to catch this. It should be used in ExecEnv::UpdateActiveCatalogd(). Fixed http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc@692 PS1, Line 692: last_update_for_registration_ = false; > nit: can move this after line 679 and delete the duplicate in line 689. last_update_for_registration_ should be used in line #681. Fixed. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc File be/src/statestore/statestore-catalogd-mgr.cc: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc@126 PS1, Line 126: DCHECK(num_registered_catalogd_ == 2); > Question: If I maliciously register more than 2 CatalogD, will statestore c Yes. CatalogD HA is controlled by central manager or k8s server in production deployment. No more than two catalogd instances are created when HA is enabled. Otherwise, it's bug in the central management. For unit-test, two catalogd instances are created by script start-impale-cluster.py when it's ran with option 'enable_catalogd_ha'. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h@116 PS1, Line 116: /// for two events: receive registration RPC reply, receive UpdateCatalogD RPC request. : /// is_registration_reply: set it as true when the function is called for the > Can we document the parameters here? I'd like to get clarification on: done. http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc File be/src/statestore/statestore-subscriber.cc: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@593 PS1, Line 593: // registration. Note that a > Can we make sure that StateStore should never return negative active_catalo Added DCHECK http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@667 PS1, Line 667: nst Status& status = CheckRegistrationId(registration_id); : if (status.ok()) { : // Try to acquire lock to avoid race with updating catalogd from registration thread. : s > nit: Is it expected that StateStore will keep retrying and subscriber will This lock is held in the path of RPC processing. It should be held for short time. It's used in the same way in StatestoreSubscriber::StatestoreStub::UpdateState() in line #721 http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc@1332 PS1, Line 1332: if (update_skipped) { : // The subscriber skipped processing this update. It's not considered as a failure : // since subscribers can decide what they do with any update. The subscriber is : // left in the receiver list so that RPC will be resent to it in next round. : ++it; > Is it possible and okay for StatestoreSubcriber to repeatedly skip notifica Added log messages in subscriber side. http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift@358 PS1, Line 358: 2: optional bool skipped; > nit: Maybe worth to add 'string skip_reason' field for debug/trace purpose? Added log messages in subscriber side. -- To view, visit http://gerrit.cloudera.org:8080/20276 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie49947e563d43c59bdd476b28c35be69848ae12a Gerrit-Change-Number: 20276 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Thu, 27 Jul 2023 23:50:54 +0000 Gerrit-HasComments: Yes
