Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19391 )

Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions
......................................................................


Patch Set 6:

(9 comments)

Looks like a very good optimization. Looks good to me.

http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@15
PS6, Line 15: interned
nit. internalized


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@18
PS6, Line 18: In fact
nit. In reality.


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@21
PS6, Line 21: adds codes to replace
nit. replaces


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@22
PS6, Line 22: hmsPartitions with the table level column list. Also add some 
progress
nit. to remove the duplications.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1862
PS6, Line 1862: fetchPartitionsByName
nit. This method probably should be renamed to fetchPartitionsByTable() due to 
the argument change.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5724
PS6, Line 5724:           LOG.info("Updated cache directive id for {}/{} 
partitions", numDone,
nit. may append "for table <t>" for clarity.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@208
PS6, Line 208: LOG.info("Fetched {}/{} partitions for table {}.{}", numDone, 
partNames.size(),
             :           msTbl.getDbName(), msTbl.getTableName());
This probably will log <maxPartitionsPerRpc_> number of progressive messages.  
In normal case, I think we can just log one message when <partNames.size()> == 
numDone outside the loop. The progressive style logging is needed only for 
abnormal situation.

This applies to other places where progressive logging is added in this patch.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@230
PS6, Line 230: respect
May need to clarify. Do you mean use?


http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py
File tests/custom_cluster/test_wide_table_operations.py:

http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@71
PS6, Line 71: 10mins for 50k partitions
nit. Is this the time after the patch or before?



--
To view, visit http://gerrit.cloudera.org:8080/19391
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d
Gerrit-Change-Number: 19391
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Dec 2022 15:24:56 +0000
Gerrit-HasComments: Yes

Reply via email to