Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11341 )
Change subject: IMPALA-7424: Optimize in-memory representation of incremental stats ...................................................................... Patch Set 4: (22 comments) http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@7 PS4, Line 7: Optimize nit: Reduce http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@10 PS4, Line 10: HMS parameters map of partition objects. Each of these strings are Java nit: ... when stored in catalogd. http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@11 PS4, Line 11: takes nit: take http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@17 PS4, Line 17: on-disk persistent? http://gerrit.cloudera.org:8080/#/c/11341/4//COMMIT_MSG@26 PS4, Line 26: optimizes nit: improves http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogObjects.thrift@297 PS4, Line 297: Deflate compressed byte[] still having trouble understanding this one. how about: "byte[] representation of TPartitionStats for this partition that is compressed using 'deflate-compression'"? http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@283 PS4, Line 283: includes nit: should include (for consistency) http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@284 PS4, Line 284: and then nit: that is http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@302 PS4, Line 302: want_partition_incremental_stats sync this with the field name in that struct. http://gerrit.cloudera.org:8080/#/c/11341/4/common/thrift/CatalogService.thrift@307 PS4, Line 307: want_partition_incremental_stats sync http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@a446 PS4, Line 446: any reason to get rid of these? http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java File fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@120 PS4, Line 120: * are stored as a deflate-compressed byte array to reduce memory footprint. perhaps include the util method to use to turn it into a TPartitionStats? http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java@158 PS4, Line 158: serialized nit: remove http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@527 PS4, Line 527: Set once and reused : // since thrift deserialization is costly. remove http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@637 PS4, Line 637: public void setPartitionStatsBytes(byte[] partitionStats, boolean hasIncrStats) { you mentioned on L527 that this should be set once-- perhaps enforce it? http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@638 PS4, Line 638: partitionStats_ = partitionStats; is there some sanity check here, for example, hasIncrStats => partitionStats != null ? http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java File fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@119 PS4, Line 119: (partStats)); strange formatting http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java@139 PS4, Line 139: tition.hasIncrementalStats() what happens with the non-incremental stats in this case? (e.g., TPartitionStats.stats) http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@60 PS4, Line 60: Set once and reused : // since thrift deserialization is costly. this clarification refers to the current case where we deserialize for each access. a new reader does not have this context, so I'd omit this part. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java File fe/src/main/java/org/apache/impala/util/CompressionUtil.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@48 PS4, Line 48: incremental stats keep it more general here http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/main/java/org/apache/impala/util/CompressionUtil.java@61 PS4, Line 61: incr stats, internal error keep it general, and consistent with the error message for compressing. http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java File fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java: http://gerrit.cloudera.org:8080/#/c/11341/4/fe/src/test/java/org/apache/impala/util/CompressionUtilTest.java@35 PS4, Line 35: Assert.assertEquals(new String(decompressed), test); include empty string/bytes. -- To view, visit http://gerrit.cloudera.org:8080/11341 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02 Gerrit-Change-Number: 11341 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Tue, 28 Aug 2018 22:51:44 +0000 Gerrit-HasComments: Yes