Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20131 )
Change subject: IMPALA-12152: Add query option to wait for events sync up ...................................................................... Patch Set 12: (9 comments) Mainly high level comments, I haven't processed all the details yet. http://gerrit.cloudera.org:8080/#/c/20131/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20131/12//COMMIT_MSG@23 PS12, Line 23: Note that the current implementation waits for the latest event id when : the WaitForHmsEvent RPC is received at catalogd side. We can improve it : once HIVE-27499 is resolved, so we can efficiently detect whether some : given tables have unprocessed events and just wait for the *largest* id : of them. Tables that have no unprocessed events don't need to block the : query planning. Are there plans on how to implement this from Impala side? What bugs me is the order of things when executing the query - currently WaitForHmsEvent() is called before creating the plan, but to get the list of tables involved, we will need some preliminary analyses before calling WaitForHmsEvent(). Even without HIVE-27499 we could profit a bit from this by skipping waiting for events for queries that do not reference any table. http://gerrit.cloudera.org:8080/#/c/20131/12/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/20131/12/be/src/service/client-request-state.h@468 PS12, Line 468: SetErrorMsg The name could indicate that this should be used early in the life cycle of the query. For example SetEarlyErrorMsg(). + same for line 660 http://gerrit.cloudera.org:8080/#/c/20131/12/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20131/12/be/src/service/impala-server.h@445 PS12, Line 445: /// Wait until catalog server catches up the latest HMS event. nit: catches up to? http://gerrit.cloudera.org:8080/#/c/20131/12/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/20131/12/common/thrift/CatalogService.thrift@691 PS12, Line 691: CatalogServiceVersion I see that it is a common pattern to use it like this, but why don't we add the header to the TCatalogServiceRequestHeader? http://gerrit.cloudera.org:8080/#/c/20131/12/common/thrift/CatalogService.thrift@751 PS12, Line 751: WaitForHmsEvent High level question: wouldn't it be better to just get the current current event id here and wait for statestore updates to propagate it? I don't know whather it is propagated now, but it doesn't seem hard to send the current synced event id with the catalog updates. My concern with this RPC is that it can be very long if the timeout is high. An example where this could complicate thing is query cancellation. Another one is catalogd change in the background - waiting for synced event id from catalog topic update could allow to handle this correctly even if the catalogd is restarted. http://gerrit.cloudera.org:8080/#/c/20131/12/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/20131/12/common/thrift/Query.thrift@668 PS12, Line 668: sync_hms_events_strict_mode I didn't find any test for this. http://gerrit.cloudera.org:8080/#/c/20131/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20131/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3810 PS12, Line 3810: long waitForEventId = metastoreEventProcessor_.getCurrentEventId(); Maybe it would be better to move some of this logic inside MetastoreEventProcessor, e.g. a function like waitForSyncUpToCurrentEvent(long timeout)? Some optimizations may be easier to implement inside EventProcessorStatus. One example is that getCurrentEventId() requests could be merged waitForHmsEvent() calls that come in a small window to avoid excessive number of RPCs to HMS. I don't mean to implement this now, but it may be useful in the future to efficiently handle large number of parallel queries. http://gerrit.cloudera.org:8080/#/c/20131/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3817 PS12, Line 3817: while The loop could also check MetastoreEventsProcessor state to stop waiting if the event processor had some error. http://gerrit.cloudera.org:8080/#/c/20131/12/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/20131/12/tests/metadata/test_event_processing.py@377 PS12, Line 377: N Shouldn't we also set strict mode in tests like this? My understanding is that this won't cause an error if the event processor goes to an error state and doesn't sync up for 10s -- To view, visit http://gerrit.cloudera.org:8080/20131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36ac941bb2c2217b09fcfa2eb567b011b38efa2a Gerrit-Change-Number: 20131 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Wed, 30 Aug 2023 10:00:50 +0000 Gerrit-HasComments: Yes