Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 )
Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1527 PS4, Line 1527: reloadFileMetadata, reloadTableSchema, false, partitionsToUpdate, : debugAction, partitionToEventId, reason); Shouldn't we consider reloadFileMetadata/reloadTableSchema/partitionsToUpdate? My concern is that we may update LastRefreshEventId() even for a partial reload of the table and ignore events that would trigger a full reload. http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1532 PS4, Line 1532: // Update the lastRefreshEventId at the table level if it is unpartitioned table : // if it is partitioned table, partitions are updated in HdfsTable#load() method This also applies to other table types, e.g. Iceberg/Kudu/HBase, right? http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3222 PS4, Line 3222: createTable(testTblName, true); Both this and the Python test used a partitioned table. As this feature is implemented differently for non-partitioned tables, it would be nice to test them too. http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3231 PS4, Line 3231: alterTableAddParameter(testTblName, "somekey", "someval"); Why is calling alterTable from Impala relevant here? http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3233 PS4, Line 3233: testTbl = (HdfsTable)catalog_.getOrLoadTable(TEST_DB_NAME, testTblName, : "test", null); : assertTrue(testTbl.getLastRefreshEventId() Shouldn't we load a table and call getLastRefreshEventId before eventsProcessor_.processEvents();? -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 07 Aug 2023 13:55:46 +0000 Gerrit-HasComments: Yes