[Impala-ASF-CR] IMPALA-14230: Add catch-up mode for event processing

Csaba Ringhofer (Code Review) Tue, 24 Feb 2026 23:32:11 -0800

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23942 )


Change subject: IMPALA-14230: Add catch-up mode for event processing
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/23942/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/23942/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1978
PS4, Line 1978:
> Added some comments.
I was thinking a bit more about this.
While avoiding the locking issues of isSelfEvent() and isOlderEvent() makes 
sense, the cheap check of table level createId and lastRefreshId should be done 
before invalidating the table IMO:

        if (getEventId() > 0 && getEventId() <= tbl.getCreateEventId())
and
        boolean canSkip = tbl.getLastRefreshEventId() >= getEventId();

Doing these first would be also useful regardless of catch up mode, as the 
locking on partitioned self event check could be avoided in some cases.

The scenario I am mainly concerned about is the following:
1. catch up mode starts due to large lag
2. meanwhile a table T is frequently used on a coordinator
3. there are some regular events on T (may not require full table reload)

While catch up mode is on, events will be evaluated very fast and T gets 
invalidated, but in the meantime we also start loading it due to coordinator 
requests. I see 2 kind of potential pathological behavior here:
a. my understanding is that each coordinator request will start to load the 
table if is an IncompleteTable so always replacing it with an IncompleteTable 
during events can lead to parallel reloads on the table (not 100% sure here)
b. once the catch up threshold is reached, we start processing the events 
again, and if those are slow, can easily go above the threshold again, and the 
work done will be thrown out on the next event ...

Checking getLastRefreshEventId() and avoiding the invalidation if the table is 
fresh enough should help here, because if the table was invalidated earlier due 
to catch up mode but was reloaded due to coordinator request, then its 
lastRefreshEventId should be far newer than the event lag, so all events will 
be skipped on this table for a long time.


http://gerrit.cloudera.org:8080/#/c/23942/5/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/23942/5/tests/custom_cluster/test_events_custom_configs.py@680
PS5, Line 680:       impalad_args="--use_local_catalog=false",
> Added a new class TestEventProcessingCatchupMode. Because COMMIT_TXN is a c
I assumed that since https://gerrit.cloudera.org/#/c/22901/ no new class is 
needed to merge tests with the same arguments (though it may be sensitive to 
the order of the tests)



--
To view, visit http://gerrit.cloudera.org:8080/23942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib906c06346d5d3159999eeac632e1318bc060065
Gerrit-Change-Number: 23942
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 25 Feb 2026 07:31:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-14230: Add catch-up mode for event processing

Reply via email to