Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: PROTOTYPE: IMPALA-12463: Batch non-consecutive table events in 
the event processor
......................................................................


Patch Set 3:

(4 comments)

I added test cases for the events that cut across batches (alter table, drop 
database, alter database) and cleaned up some of the code.

http://gerrit.cloudera.org:8080/#/c/20533/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20533/3//COMMIT_MSG@13
PS3, Line 13: could batched
> Nit: could be batched
Thanks for pointing that out, fixed


http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@327
PS3, Line 327:       TreeMap<Long, MetastoreEvent>
> Is my understanding correct that the map is only needed because batchable e
It's more about maintaining monotonically increasing Event IDs. Other parts of 
the code rely on the events being monotonically increasing in Event ID, and for 
batches that is the ending Event ID. Batching is no longer about contiguous 
events, so the ending Event ID for a batch is not known until we finalize the 
batch, either by hitting an event that cuts the batch or by running out of new 
events. The tableEventMap stores in-progress batches until we know their ending 
Event ID, which may not be until we reach the end of the events list.

i.e. a sequence like ABAB would be A(1-3), B(2-4), but a sequence like ABABA 
would be B(2-4), A(1-5) and we wouldn't know until we hit that last A. There 
could be a lot of intervening events for other tables like ABABCDEFGHIJKLMNA, 
but that last A still changes the order of the batch for A.

The sortedBatchedEvents is only for finalized batches. Some batches could be 
finalized by an event that cuts the batch, but a bunch will be finalized when 
we reach the end of the events list. The order that batches are finalized 
doesn't tell you the order they should be emitted. So, it does a sort to keep 
things monotonically increasing.

I renamed these structures to try to make it clearer. tableEventMap becomes 
pendingTableEventsMap. sortedBatchedEvents becomes sortedFinalBatches.


http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@346
PS3, Line 346:           while (it.hasNext()) {
> Some of the logic could be moved to helper functions to make the main loop
Good point, I split out that logic into helper methods.


http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@368
PS3, Line 368:             // Flush any batched events for the destination 
table.
> A note could be added about this being an invalid case, as the dest table s
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Dec 2023 05:59:35 +0000
Gerrit-HasComments: Yes

Reply via email to