Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22634 )

Change subject: IMPALA-13850: Wait until CatalogD active before resetting 
metadata
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG@13
PS5, Line 13: registration. catalog_lock_ was held by 
GatherCatalogUpdatesThread which
> Let's mention the lock is held by GatherCatalogUpdatesThread which is calli
Done


http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG@23
PS5, Line 23: TriggerResetMetadata that monitor and trigger this initial reset
> I think coordinators will still wait for the initial catalog updates from s
It appears to be the case.
https://github.com/apache/impala/commit/1d63348b933b266f63d76b06eecbdf636cb45770

I wonder if sending minimal catalog update for the first time is OK to signal 
that CatalogD is live.
Patch set 6 employs this strategy.

My WIP at https://gerrit.cloudera.org/c/22640 can break down long 
CatalogServiceCatalog.reset() call, but it can still block at slow-loading Db.


http://gerrit.cloudera.org:8080/#/c/22634/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/22634/5/be/src/catalog/catalog-server.cc@590
PS5, Line 590:   if (!status.ok()) {
             :     status.AddDetail("CatalogService failed to start");
             :     return status;
             :   }
             :   // Add callback to handle notification of updating catalogd 
from Statestore.
             :   if (FLAGS_enable_catalogd_ha) {
             :     S
> If FLAGS_force_catalogd_active is true, TriggerResetMetadataLocked() will b
Removed. catalog_first_reset_metadata_thread_ now managed it.


http://gerrit.cloudera.org:8080/#/c/22634/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/22634/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2322
PS5, Line 2322:           
DebugUtils.executeDebugAction(BackendConfig.INSTANCE.debugActions(),
              :               DebugUtils.RESET_METADATA_LOOP_LOCKED);
> What about moving this outside of the loop so the wait time is independent
I plan to have another debug action RESET_METADATA_LOOP_UNLOCKED in 
https://gerrit.cloudera.org/c/22640/.
I'd like both of them in Db loop so that we can tests 2 variety of slowdown 
while looping Db list.

For this specific patch though, it is sufficient for 
test_catalogd_auto_failover_slow to have the Db loop hold the lock for over 
than 1 second. I think without any debug action, this loop can complete in 
under half of a second.


http://gerrit.cloudera.org:8080/#/c/22634/5/tests/custom_cluster/test_catalogd_ha.py
File tests/custom_cluster/test_catalogd_ha.py:

http://gerrit.cloudera.org:8080/#/c/22634/5/tests/custom_cluster/test_catalogd_ha.py@109
PS5, Line 109:       assert page.status_code == requests.codes.ok
> Can we also verify the coordinator is healthy?
Done



--
To view, visit http://gerrit.cloudera.org:8080/22634
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Gerrit-Change-Number: 22634
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 25 Mar 2025 02:58:04 +0000
Gerrit-HasComments: Yes

Reply via email to