Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 )
Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables ...................................................................... Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> txnToWriteIds_ = : new ConcurrentHashMap<>(); > The difficulty here is we have some DDLs advancing write id without changin Thanks for the clarification. "the new HMS API getAllWriteEventInfo only return info for WRITE events" I am not sure why this is any different for DDLs v/s insert. Can you please provide some more details why this API does not return DDL writeIds? http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891 PS6, Line 2891: case do we need a default: clause which throws a exception? http://gerrit.cloudera.org:8080/#/c/17858/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17858/5/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS5, Line 142: volatile is this needed? http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java@142 PS6, Line 142: volatile not sure why we need this? http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1968 PS6, Line 1968: AllocWriteIdEvent please add class comment for newly added classes. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2049 PS6, Line 2049: writeEventInfoList = client.getHiveClient().getAllWriteEventInfo( : new GetAllWriteEventInfoRequest(txnId_)); a comment here giving details on what this API returns will be helpful for the reader. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2066 PS6, Line 2066: catalog_.removeWriteIds(txnId_); This line must be in finally block otherwise we are leaking memory in case an exception is thrown in the lines above. http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2079 PS6, Line 2079: addCommittedWriteIdsAndRefreshPartitions It looks like this method is refresh one partition at a time. Can we send all the partitions so that they are reloaded in parallel? http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4256 PS6, Line 4256: !(table instanceof HdfsTable) do we need a check for MVs here? http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4271 PS6, Line 4271: // HMS doesn't set partition object's write id correctly, so we set : // it here as workaround. : long writeId = writeIds.get(it.nextIndex()); : Partition part = it.next(); : part.setWriteId(writeId); This could lead to confusion, for instance, we may be sending the partition objects later to HMS and they may overwrite partition writeIds in HMS. I understand that it is not used today, but this could cause subtle bugs which would be very hard to diagnose in case we add partition level writeId in future. Since the goal here to just track the writeId in catalogd per partition, can we just add a new field in HdfsPartition for writeId_? -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 6 Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Tue, 19 Oct 2021 17:44:34 +0000 Gerrit-HasComments: Yes