Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/19391 )
Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions ...................................................................... Patch Set 6: (9 comments) Looks like a very good optimization. Looks good to me. http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@15 PS6, Line 15: interned nit. internalized http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@18 PS6, Line 18: In fact nit. In reality. http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@21 PS6, Line 21: adds codes to replace nit. replaces http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@22 PS6, Line 22: hmsPartitions with the table level column list. Also add some progress nit. to remove the duplications. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1862 PS6, Line 1862: fetchPartitionsByName nit. This method probably should be renamed to fetchPartitionsByTable() due to the argument change. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5724 PS6, Line 5724: LOG.info("Updated cache directive id for {}/{} partitions", numDone, nit. may append "for table <t>" for clarity. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@208 PS6, Line 208: LOG.info("Fetched {}/{} partitions for table {}.{}", numDone, partNames.size(), : msTbl.getDbName(), msTbl.getTableName()); This probably will log <maxPartitionsPerRpc_> number of progressive messages. In normal case, I think we can just log one message when <partNames.size()> == numDone outside the loop. The progressive style logging is needed only for abnormal situation. This applies to other places where progressive logging is added in this patch. http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@230 PS6, Line 230: respect May need to clarify. Do you mean use? http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py File tests/custom_cluster/test_wide_table_operations.py: http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@71 PS6, Line 71: 10mins for 50k partitions nit. Is this the time after the patch or before? -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Thu, 29 Dec 2022 15:24:56 +0000 Gerrit-HasComments: Yes