Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 )
Change subject: IMPALA-10642: Write support for Parquet Bloom filters - most common types ...................................................................... Patch Set 12: (6 comments) Change looks good. I have added few comments mostly nit. http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc@632 PS12, Line 632: bool ShouldUpdateParquetBloomFilter() const { : if (parquet_bloom_filter_ == nullptr) { : // There is no initialised ParquetBloomFilter, an error may have occured earlier. : return false; : } : : DCHECK(parquet_bloom_filter_tbl_prop_enabled_); : switch (parent_->state_->query_options().parquet_bloom_filter_write) { : case TParquetBloomFilterWrite::NEVER: { : return false; : } : case TParquetBloomFilterWrite::IF_NO_DICT: { : return !IsDictionaryEncoding(current_encoding_); : } : case TParquetBloomFilterWrite::ALWAYS: { : return true; : } : default: { : DCHECK(false) << "Unexpected enum variant: " << PrintThriftEnum( : parent_->state_->query_options().parquet_bloom_filter_write); : return false; : } : } : } : : bool ShouldInitParquetBloomFilter() const { : if (!parquet_bloom_filter_tbl_prop_enabled_) return false; : : switch (parent_->state_->query_options().parquet_bloom_filter_write) { : case TParquetBloomFilterWrite::NEVER: { : return false; : } : case TParquetBloomFilterWrite::IF_NO_DICT: { : return true; : } : case TParquetBloomFilterWrite::ALWAYS: { : return true; : } : default: { : DCHECK(false) << "Unexpected enum variant: " << PrintThriftEnum( : parent_->state_->query_options().parquet_bloom_filter_write); : return false; : } : } : } We can probably combine these 2 validation under something like isParquetBloomFilterAllowed(boolean update), where update is true for Update operation and for init its false. That may reduce the code duplication. http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc@715 PS12, Line 715: if (parquet_bloom_filter_tbl_prop_enabled_ If this method is invoked when parquet_bloom_filter_ is nullptr ( may be due to Init failure or Update failure), I think it might run into issue. We should probably check for that. Some check like ShouldUpdateParquetBloomFilter might be required. http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h File be/src/util/parquet-bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@41 PS12, Line 41: /// If 'always_false' is true, the implementation assumes that the directory is empty. nit: 'always_false' -> 'always_false_' http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@42 PS12, Line 42: 'always_false' nit: same as above http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@122 PS12, Line 122: bool always_false_; a comment above this member field will help. http://gerrit.cloudera.org:8080/#/c/17262/12/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17262/12/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@249 PS12, Line 249: private Map<String, Long> parseParquetBloomFilterWritingTblProp(final String tbl_prop) { Is it possible to write any Unit tests for this parsing logic covering various scenarios ? -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 15 Jun 2021 16:08:35 +0000 Gerrit-HasComments: Yes