Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )
Change subject: IMPALA-10801: Check the latest compaction Id before serving request ...................................................................... Patch Set 4: (9 comments) Thanks a lot for working on this. This is definitely very useful. I took a first pass at this. I will take another look at this while you address the comments above. http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7 PS4, Line 7: request nit, May be change this to say "ACID table" to be more specific. http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125 PS4, Line 2125: tbl.readLock().lock(); Can you add a Preconditions check before this line to make sure that the table is a ACID table? This makes sure that we don't regress unintentionally by taking read lock on non-transactional tables here. Preconditions.checkState(AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters()))); http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127 PS4, Line 2127: refreshNeededPartitions nit, can we rename this variable to something like "partsToBeRefreshed" to make it more readable? http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187 PS4, Line 2187: HdfsTable change to "ACID tables" since external tables are also HdfsTables http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486 PS4, Line 3486: TableLoadingMgr.LoadRequest loadReq = tableLoadingMgr_.getLoadRequest(tableName); : if (loadReq != null) { : try { : return loadReq.get(); : } finally { : loadReq.close(); : } : } This logic seems to have a race condition. How do we know that the loadReq got completed before or after compaction? If it completed just before compaction, we are still going to return uncompacted files. Is this just for optimization or needed for certain edge case? http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@727 PS4, Line 727: public since this method is now made public, I think we should add a Preconditions check here to make sure that the table write lock was table before calling this method. Preconditions.checkState(this.isWriteLockedByCurrentThread()); http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@760 PS4, Line 760: there is a compaction nit, another compaction http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824 PS4, Line 824: Map<String, HdfsPartition.Builder> nameToPartition = : partBuilders.stream().collect(Collectors.toMap( : HdfsPartition.Builder::getPartitionName, builder -> builder)); If you move this to line 805 you can avoid iterating the partBuilders twice by passing new ArrayList<>(nameToPartition.keySet()) on line 812. http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827 PS4, Line 827: for (CompactionInfoStruct lci : resp.getCompactions()) { : String partName = lci.getPartitionname() == null ? DEFAULT_PARTITION_NAME : : lci.getPartitionname(); : HdfsPartition.Builder partition = nameToPartition.get(partName); : partition.setLastCompactionId(lci.getId()); : } I think the code readability can be improved if you handle the non-partitioned and partitioned case separately. e.g if (isPartitioned()) { for (CompactionInfoStruct lci : resp.getCompactions()) { HdfsPartition.Builder partition = nameToPartition.get(lci.getPartitionname()); Preconditions.checkNotNull(partition); partition.setLastCompactionId(lci.getId()); } } else { CompactionInfoStruct ci = Iterables.getOnlyElement(resp.getCompactions()); HdfsPartition.Builder partBuilder = Iterables.getOnlyElement(partBuilders); partBuilder.setLastCompactionId(ci.getId()); } -- To view, visit http://gerrit.cloudera.org:8080/17697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e Gerrit-Change-Number: 17697 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Fri, 23 Jul 2021 22:46:09 +0000 Gerrit-HasComments: Yes