Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20276 )

Change subject: IMPALA-12321: Fix the race condition for updating active 
catalogd
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@10
PS1, Line 10: during registration, or
            : a new active catalogd is elected
nit: "on two different events: catalogd registration, and a new active catalogd 
election change"


http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@12
PS1, Line 12: two different kinds of RPCs
nit: what is the difference? the value of 'for_registration' parameter?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@163
PS1, Line 163: catalog_objects_to/from_version_
nit: is this refer to catalog_objects_max_version_?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@171
PS1, Line 171: bool last_update_for_registration_ = false;
This is assigned in CatalogServer::UpdateActiveCatalogd, but never used 
anywhere else. Maybe can be removed?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@499
PS1, Line 499: for_registration
nit: I think 'is_registration_reply' is more appropriate.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@523
PS1, Line 523: last_update_for_registration_ = false;
nit: can move this after line 510 and delete the duplicate in line 520.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h@330
PS1, Line 330: bool last_update_for_registration_ = false;
This is assigned in ExecEnv::UpdateActiveCatalogd, but never used anywhere 
else. Maybe can be removed?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc@692
PS1, Line 692: last_update_for_registration_ = false;
nit: can move this after line 679 and delete the duplicate in line 689.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc
File be/src/statestore/statestore-catalogd-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc@126
PS1, Line 126: DCHECK(num_registered_catalogd_ == 2);
Question: If I maliciously register more than 2 CatalogD, will statestore crash 
hitting this DCHECK?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h@116
PS1, Line 116: typedef boost::function<void (bool for_registration, int64_t 
active_catalogd_version,
             :       const TCatalogRegistration& catalogd_registration)> 
UpdateCatalogdCallback;
Can we document the parameters here? I'd like to get clarification on:
- What is the meaning of 'for_registration' value?
- What is the meaning of negative active_catalogd_version? 
active_catalogd_version parameter in this function looks like a meld between 
has_active_catalogd and active_catalogd_version from StatestoreStub::Register.
- When is catalogd_registration deemed invalid and safe to ignore?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@593
PS1, Line 593: active_catalogd_version = -1
Can we make sure that StateStore should never return negative 
active_catalogd_version before this override?
Maybe add DCHECK or WARNING log.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@667
PS1, Line 667: if (!l.owns_lock()) {
             :       *update_skipped = true;
             :       return;
             :     }
nit: Is it expected that StateStore will keep retrying and subscriber will 
eventually owns the lock and process the RPC instead of skipping indefinitely?


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc@1332
PS1, Line 1332: if (update_skipped) {
              :         // The subscriber skipped processing this update. It's 
not considered as a failure
              :         // since subscribers can decide what they do with any 
update. The subscriber is
              :         // left in the receiver list so that RPC will be resent 
to it in next round.
              :         ++it;
Is it possible and okay for StatestoreSubcriber to repeatedly skip notification 
without error?
Adding logs in the StateStore or the subscriber side can help triage if bad 
case do happen.


http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift@358
PS1, Line 358: 2: optional bool skipped;
nit: Maybe worth to add 'string skip_reason' field for debug/trace purpose?



--
To view, visit http://gerrit.cloudera.org:8080/20276
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie49947e563d43c59bdd476b28c35be69848ae12a
Gerrit-Change-Number: 20276
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Thu, 27 Jul 2023 20:20:18 +0000
Gerrit-HasComments: Yes

Reply via email to