Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20276 )

Change subject: IMPALA-12321: Fix the race condition for updating active 
catalogd
......................................................................


Patch Set 2:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@10
PS1, Line 10: on two different events:
            : catalogd registration, and new a
> nit: "on two different events: catalogd registration, and a new active cata
Done


http://gerrit.cloudera.org:8080/#/c/20276/1//COMMIT_MSG@12
PS1, Line 12: testored sends the active c
> nit: what is the difference? the value of 'for_registration' parameter?
One is StatestoreService.RegisterSubscriber from subscriber to statestore, 
another one is StatestoreSubscriber.UpdateCatalogd from statestore to 
subscriber.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@163
PS1, Line 163: catalog_objects_to/from_version_
> nit: is this refer to catalog_objects_max_version_?
yes


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.h@171
PS1, Line 171: bool last_update_for_registration_ = false;
> This is assigned in CatalogServer::UpdateActiveCatalogd, but never used any
Should be used in CatalogServer::UpdateActiveCatalogd(). Fixed.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@499
PS1, Line 499: is_registration_
> nit: I think 'is_registration_reply' is more appropriate.
Done


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/catalog/catalog-server.cc@523
PS1, Line 523: last_update_for_registration_ = false;
> nit: can move this after line 510 and delete the duplicate in line 520.
last_update_for_registration_ should be used in in line #512.
Fixed.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.h@330
PS1, Line 330: /// True if the last update of active catal
> This is assigned in ExecEnv::UpdateActiveCatalogd, but never used anywhere
Thanks to catch this.
It should be used in ExecEnv::UpdateActiveCatalogd(). Fixed


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/runtime/exec-env.cc@692
PS1, Line 692: last_update_for_registration_ = false;
> nit: can move this after line 679 and delete the duplicate in line 689.
last_update_for_registration_ should be used in line #681. Fixed.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc
File be/src/statestore/statestore-catalogd-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-catalogd-mgr.cc@126
PS1, Line 126: DCHECK(num_registered_catalogd_ == 2);
> Question: If I maliciously register more than 2 CatalogD, will statestore c
Yes. CatalogD HA is controlled by central manager or k8s server in production 
deployment. No more than two catalogd instances are created when HA is enabled. 
Otherwise, it's bug in the central management.
For unit-test, two catalogd instances are created by script 
start-impale-cluster.py when it's ran with option 'enable_catalogd_ha'.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.h@116
PS1, Line 116: /// for two events: receive registration RPC reply, receive 
UpdateCatalogD RPC request.
             :   ///   is_registration_reply: set it as true when the function 
is called for the
> Can we document the parameters here? I'd like to get clarification on:
done.


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@593
PS1, Line 593: // registration. Note that a
> Can we make sure that StateStore should never return negative active_catalo
Added DCHECK


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore-subscriber.cc@667
PS1, Line 667: nst Status& status = CheckRegistrationId(registration_id);
             :   if (status.ok()) {
             :     // Try to acquire lock to avoid race with updating catalogd 
from registration thread.
             :     s
> nit: Is it expected that StateStore will keep retrying and subscriber will
This lock is held in the path of RPC processing. It should be held for short 
time. It's used in the same way in 
StatestoreSubscriber::StatestoreStub::UpdateState() in line #721


http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/20276/1/be/src/statestore/statestore.cc@1332
PS1, Line 1332: if (update_skipped) {
              :         // The subscriber skipped processing this update. It's 
not considered as a failure
              :         // since subscribers can decide what they do with any 
update. The subscriber is
              :         // left in the receiver list so that RPC will be resent 
to it in next round.
              :         ++it;
> Is it possible and okay for StatestoreSubcriber to repeatedly skip notifica
Added log messages in subscriber side.


http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

http://gerrit.cloudera.org:8080/#/c/20276/1/common/thrift/StatestoreService.thrift@358
PS1, Line 358: 2: optional bool skipped;
> nit: Maybe worth to add 'string skip_reason' field for debug/trace purpose?
Added log messages in subscriber side.



--
To view, visit http://gerrit.cloudera.org:8080/20276
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie49947e563d43c59bdd476b28c35be69848ae12a
Gerrit-Change-Number: 20276
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Thu, 27 Jul 2023 23:50:54 +0000
Gerrit-HasComments: Yes

Reply via email to