Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 )
Change subject: PROTOTYPE: IMPALA-12463: Batch non-consecutive table events in the event processor ...................................................................... Patch Set 3: (4 comments) I added test cases for the events that cut across batches (alter table, drop database, alter database) and cleaned up some of the code. http://gerrit.cloudera.org:8080/#/c/20533/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20533/3//COMMIT_MSG@13 PS3, Line 13: could batched > Nit: could be batched Thanks for pointing that out, fixed http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@327 PS3, Line 327: TreeMap<Long, MetastoreEvent> > Is my understanding correct that the map is only needed because batchable e It's more about maintaining monotonically increasing Event IDs. Other parts of the code rely on the events being monotonically increasing in Event ID, and for batches that is the ending Event ID. Batching is no longer about contiguous events, so the ending Event ID for a batch is not known until we finalize the batch, either by hitting an event that cuts the batch or by running out of new events. The tableEventMap stores in-progress batches until we know their ending Event ID, which may not be until we reach the end of the events list. i.e. a sequence like ABAB would be A(1-3), B(2-4), but a sequence like ABABA would be B(2-4), A(1-5) and we wouldn't know until we hit that last A. There could be a lot of intervening events for other tables like ABABCDEFGHIJKLMNA, but that last A still changes the order of the batch for A. The sortedBatchedEvents is only for finalized batches. Some batches could be finalized by an event that cuts the batch, but a bunch will be finalized when we reach the end of the events list. The order that batches are finalized doesn't tell you the order they should be emitted. So, it does a sort to keep things monotonically increasing. I renamed these structures to try to make it clearer. tableEventMap becomes pendingTableEventsMap. sortedBatchedEvents becomes sortedFinalBatches. http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@346 PS3, Line 346: while (it.hasNext()) { > Some of the logic could be moved to helper functions to make the main loop Good point, I split out that logic into helper methods. http://gerrit.cloudera.org:8080/#/c/20533/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@368 PS3, Line 368: // Flush any batched events for the destination table. > A note could be added about this being an invalid case, as the dest table s Done -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 12 Dec 2023 05:59:35 +0000 Gerrit-HasComments: Yes