Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21031 )
Change subject: IMPALA-12709: Add support for hierarchical metastore event processing ...................................................................... Patch Set 38: (7 comments) http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@996 PS36, Line 996: public static class PseudoCommitTxnEvent extends MetastoreTableEvent > Shall i take it up as next PR? Sure. Just a thought. http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java File fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java: http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java@263 PS38, Line 263: * @param skipEventId Skip all the events that are less than this event Id nit: stale comment http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java@457 PS38, Line 457: long getOutstandingEventCount() { : long outstandingEventCount; : synchronized (outstandingEventCountLock_) { : outstandingEventCount = outstandingEventCount_; : } : return outstandingEventCount + tableEventExecutors_.stream() : .mapToLong(TableEventExecutor::getOutstandingEventCount).sum(); : } > Looking again, we might have a potential split-brain problem here by having Checked the usages of the final count returned in MetastoreEventProcessor#getOutstandingEventCount(), it seems we just care about whether it's 0 or exceeding the limit of max_outstanding_events_on_executors. Maybe it doesn't worth synchronizing all the event executors to get an accurate number. If we get 0 from MetastoreEventProcessor#getOutstandingEventCount(), it means all db/table event executors have outstandingEventCount_ == 0 since all of them are >= 0. If we see the total count exceeds max_outstanding_events_on_executors, the accurate number is close to that value and is OK to stop for a while. Another usage is in the metrics which also doesn't impact correctness. So it seems the current code is OK. But I might miss something. http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@804 PS38, Line 804: long count = 0; : if (eventExecutorService_ != null) { : count = eventExecutorService_.getOutstandingEventCount(); : } : return count; nit: can be simplified to if (eventExecutorService_ == null) return 0; return eventExecutorService_.getOutstandingEventCount(); http://gerrit.cloudera.org:8080/#/c/21031/38/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1463 PS38, Line 1463: lastSyncedEventId_.set(event.getEventId()); > I understand that eventExecutorService_ enable us from processing events in If we want this, we'd better add a flag for it. Note that slowness in processing an event will still block events in the next batch. E.g. if processing an event takes 30mins, it will add to the lag and impact unrelated tables, which is one scenario we want to fix. By adding the flag, we can turn it off (not synchronizing batches) in perf test or in production. http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/main/java/org/apache/impala/catalog/events/RenameTableBarrierEvent.java File fe/src/main/java/org/apache/impala/catalog/events/RenameTableBarrierEvent.java: http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/main/java/org/apache/impala/catalog/events/RenameTableBarrierEvent.java@72 PS36, Line 72: nt pr > Thank you for your explanation. Catalog and HMS event code manage catalog s Agree with Riza that adding "synchronized" to the methods is more maintainable. The current code has an assumption on how the reader threads use/read these values. It might perform better but we don't know how much it is actually. Might not worth taking the risk of introducing a race condition bug if other developers don't read the comments carefully and break this assumption in the future. After this patch is merged, we can do some perf test to see If adding "synchronized" is a bottleneck and optimize it in needs. http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java File fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java: http://gerrit.cloudera.org:8080/#/c/21031/36/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java@158 PS36, Line 158: if (event instanceof DbBarrierEvent) { > Yes. Done. Do we have test coverage for such cases? -- To view, visit http://gerrit.cloudera.org:8080/21031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Gerrit-Change-Number: 21031 Gerrit-PatchSet: 38 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Thu, 20 Mar 2025 11:33:08 +0000 Gerrit-HasComments: Yes
