Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: IMPALA-10642: Write support for Parquet Bloom filters - most 
common types
......................................................................


Patch Set 12:

(6 comments)

Change looks good. I have added few comments mostly nit.

http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc@632
PS12, Line 632:   bool ShouldUpdateParquetBloomFilter() const {
              :     if (parquet_bloom_filter_ == nullptr) {
              :       // There is no initialised ParquetBloomFilter, an error 
may have occured earlier.
              :       return false;
              :     }
              :
              :     DCHECK(parquet_bloom_filter_tbl_prop_enabled_);
              :     switch 
(parent_->state_->query_options().parquet_bloom_filter_write) {
              :       case TParquetBloomFilterWrite::NEVER: {
              :         return false;
              :       }
              :       case TParquetBloomFilterWrite::IF_NO_DICT: {
              :         return !IsDictionaryEncoding(current_encoding_);
              :       }
              :       case TParquetBloomFilterWrite::ALWAYS: {
              :         return true;
              :       }
              :       default: {
              :         DCHECK(false) << "Unexpected enum variant: " << 
PrintThriftEnum(
              :             
parent_->state_->query_options().parquet_bloom_filter_write);
              :         return false;
              :       }
              :     }
              :   }
              :
              :   bool ShouldInitParquetBloomFilter() const {
              :     if (!parquet_bloom_filter_tbl_prop_enabled_) return false;
              :
              :     switch 
(parent_->state_->query_options().parquet_bloom_filter_write) {
              :       case TParquetBloomFilterWrite::NEVER: {
              :         return false;
              :       }
              :       case TParquetBloomFilterWrite::IF_NO_DICT: {
              :         return true;
              :       }
              :       case TParquetBloomFilterWrite::ALWAYS: {
              :         return true;
              :       }
              :       default: {
              :         DCHECK(false) << "Unexpected enum variant: " << 
PrintThriftEnum(
              :             
parent_->state_->query_options().parquet_bloom_filter_write);
              :         return false;
              :       }
              :     }
              :   }
We can probably combine these 2 validation under something like 
isParquetBloomFilterAllowed(boolean update), where update is true for Update 
operation and for init its false. That may reduce the code duplication.


http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/exec/parquet/hdfs-parquet-table-writer.cc@715
PS12, Line 715:     if (parquet_bloom_filter_tbl_prop_enabled_
If this method is invoked when parquet_bloom_filter_ is nullptr ( may be due to 
Init failure or Update failure), I think it might run into issue. We should 
probably check for that. Some check like ShouldUpdateParquetBloomFilter might 
be required.


http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h
File be/src/util/parquet-bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@41
PS12, Line 41:   /// If 'always_false' is true, the implementation assumes that 
the directory is empty.
nit: 'always_false' -> 'always_false_'


http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@42
PS12, Line 42: 'always_false'
nit: same as above


http://gerrit.cloudera.org:8080/#/c/17262/12/be/src/util/parquet-bloom-filter.h@122
PS12, Line 122:   bool always_false_;
a comment above this member field will help.


http://gerrit.cloudera.org:8080/#/c/17262/12/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17262/12/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@249
PS12, Line 249:   private Map<String, Long> 
parseParquetBloomFilterWritingTblProp(final String tbl_prop) {
Is it possible to write any Unit tests for this parsing logic covering various 
scenarios ?



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 16:08:35 +0000
Gerrit-HasComments: Yes

Reply via email to