Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 )
Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables ...................................................................... Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re @Yu-Wen: In addition to what Vihang asked, how would we handle the following scenario: Suppose the following HMS events are generated for a ddl on a partition: 1. Open txn 2. Allocate write id 3. perform ddl op 4. Commit txn Now if event processor is started and say it starts processing events from #3. Assuming that write id generated in #2 is not returned in HMS API getAllWriteEventInfo, does this mean that event processor would skip adding writeId from #2 to committed writeIds list? If so, is this the right/desired behavior? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057 PS7, Line 2057: commitTxnMessage_.addWriteEventInfo(writeEventInfoList); Why are we modifying commitTxnMesage? Can't we get all the required info from writeEventInfoList? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080 PS7, Line 2080: Preconditions.checkNotNull(commitTxnMessage_.getPartitions()); Why are we checking for non null partitions? Wouldn't unpartitioned table have null partitions? http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4160 PS7, Line 4160: List<Partition> partsFromEvent, boolean loadFromEvent, boolean loadFileMetadata, Looks like if loadFromEvent is true, then in HdfsTable we avoid making HMS api getPartitionsByNames(). I don't think adding loadFromEvent flag is the right way to do this. We should have clear apis in catalogOpExecutor, HdfsTable etc to reload partitions based on partition names and partition objects. http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4233 PS7, Line 4233: public int addCommittedWriteIdsAndReloadPartitionsIfExist(long eventId, String dbName, This api is not making much sense to me here. More so since it is public. Should we move this inside CommitTxnEvent? Some parts like getting table from cache can be moved to a common place in catalogOpExecutor. -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 7 Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Wed, 20 Oct 2021 17:27:10 +0000 Gerrit-HasComments: Yes