Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23942 )
Change subject: IMPALA-14230: Add catch-up mode for event processing ...................................................................... Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/23942/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/23942/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1978 PS4, Line 1978: > Added some comments. I was thinking a bit more about this. While avoiding the locking issues of isSelfEvent() and isOlderEvent() makes sense, the cheap check of table level createId and lastRefreshId should be done before invalidating the table IMO: if (getEventId() > 0 && getEventId() <= tbl.getCreateEventId()) and boolean canSkip = tbl.getLastRefreshEventId() >= getEventId(); Doing these first would be also useful regardless of catch up mode, as the locking on partitioned self event check could be avoided in some cases. The scenario I am mainly concerned about is the following: 1. catch up mode starts due to large lag 2. meanwhile a table T is frequently used on a coordinator 3. there are some regular events on T (may not require full table reload) While catch up mode is on, events will be evaluated very fast and T gets invalidated, but in the meantime we also start loading it due to coordinator requests. I see 2 kind of potential pathological behavior here: a. my understanding is that each coordinator request will start to load the table if is an IncompleteTable so always replacing it with an IncompleteTable during events can lead to parallel reloads on the table (not 100% sure here) b. once the catch up threshold is reached, we start processing the events again, and if those are slow, can easily go above the threshold again, and the work done will be thrown out on the next event ... Checking getLastRefreshEventId() and avoiding the invalidation if the table is fresh enough should help here, because if the table was invalidated earlier due to catch up mode but was reloaded due to coordinator request, then its lastRefreshEventId should be far newer than the event lag, so all events will be skipped on this table for a long time. http://gerrit.cloudera.org:8080/#/c/23942/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/23942/5/tests/custom_cluster/test_events_custom_configs.py@680 PS5, Line 680: impalad_args="--use_local_catalog=false", > Added a new class TestEventProcessingCatchupMode. Because COMMIT_TXN is a c I assumed that since https://gerrit.cloudera.org/#/c/22901/ no new class is needed to merge tests with the same arguments (though it may be sensitive to the order of the tests) -- To view, visit http://gerrit.cloudera.org:8080/23942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib906c06346d5d3159999eeac632e1318bc060065 Gerrit-Change-Number: 23942 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 25 Feb 2026 07:31:54 +0000 Gerrit-HasComments: Yes
