Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
......................................................................


Patch Set 4:

(9 comments)

Thanks a lot for working on this. This is definitely very useful. I took a 
first pass at this. I will take another look at this while you address the 
comments above.

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7
PS4, Line 7: request
nit, May be change this to say "ACID table" to be more specific.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125
PS4, Line 2125: tbl.readLock().lock();
Can you add a Preconditions check before this line to make sure that the table 
is a ACID table? This makes sure that we don't regress unintentionally by 
taking read lock on non-transactional tables here.

Preconditions.checkState(AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters())));


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS4, Line 2127: refreshNeededPartitions
nit, can we rename this variable to something like "partsToBeRefreshed" to make 
it more readable?


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187
PS4, Line 2187: HdfsTable
change to "ACID tables" since external tables are also HdfsTables


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486
PS4, Line 3486:     TableLoadingMgr.LoadRequest loadReq = 
tableLoadingMgr_.getLoadRequest(tableName);
              :     if (loadReq != null) {
              :       try {
              :         return loadReq.get();
              :       } finally {
              :         loadReq.close();
              :       }
              :     }
This logic seems to have a race condition. How do we know that the loadReq got 
completed before or after compaction? If it completed just before compaction, 
we are still going to return uncompacted files. Is this just for optimization 
or needed for certain edge case?


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@727
PS4, Line 727: public
since this method is now made public, I think we should add a Preconditions 
check here to make sure that the table write lock was table before calling this 
method.

Preconditions.checkState(this.isWriteLockedByCurrentThread());


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@760
PS4, Line 760: there is a compaction
nit, another compaction


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824
PS4, Line 824:     Map<String, HdfsPartition.Builder> nameToPartition =
             :         partBuilders.stream().collect(Collectors.toMap(
             :             HdfsPartition.Builder::getPartitionName, builder -> 
builder));
If you move this to line 805 you can avoid iterating the partBuilders twice by 
passing new ArrayList<>(nameToPartition.keySet()) on line 812.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827
PS4, Line 827: for (CompactionInfoStruct lci : resp.getCompactions()) {
             :       String partName = lci.getPartitionname() == null ? 
DEFAULT_PARTITION_NAME :
             :                                                          
lci.getPartitionname();
             :       HdfsPartition.Builder partition = 
nameToPartition.get(partName);
             :       partition.setLastCompactionId(lci.getId());
             :     }
I think the code readability can be improved if you handle the non-partitioned 
and partitioned case separately. e.g

if (isPartitioned()) {
     for (CompactionInfoStruct lci : resp.getCompactions()) {
                                     
      HdfsPartition.Builder partition = 
nameToPartition.get(lci.getPartitionname());
      Preconditions.checkNotNull(partition);
      partition.setLastCompactionId(lci.getId());
    }
} else {
   CompactionInfoStruct ci = Iterables.getOnlyElement(resp.getCompactions());
   HdfsPartition.Builder partBuilder = Iterables.getOnlyElement(partBuilders);
   partBuilder.setLastCompactionId(ci.getId());
}



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 22:46:09 +0000
Gerrit-HasComments: Yes

Reply via email to