Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
......................................................................


Patch Set 24:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7
PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common 
types
Do we filter fields in complex types as well, e.g. elements of an array? Or 
only top-level columns?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887
PS24, Line 887:     Status bloom_filter_status = ProcessBloomFilter(row_group,
We have query options to disable row group/page index filtering. Probably we 
should add a query option for bloom filtering as well, so in case there's a bug 
in the code the users will be able to disable it.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483
PS24, Line 1483: ReadToBuffer
nit: thanks for adding this member function. Can we use it at 
ParquetPageIndex::ReadAll() as well?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1497
PS24, Line 1497: cache_options));
nit: fits previous line


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1502
PS24, Line 1502:
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716
PS24, Line 1716:       if (!bloom_filter.Find(hash)) {
nit: could you please add some logging at VLOG(3)? E.g. which conjunct filtered 
which row group in which file?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37
PS24, Line 37: May or may not use
nit: could you add some details please? When storage is used, and when it 
isn't? And when is 'ptr' used?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/kudu/util/block_bloom_filter.cc
File be/src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/kudu/util/block_bloom_filter.cc@135
PS24, Line 135:
nit: too much indentation


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/thirdparty/xxhash/xxhash.h@70
PS24, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
> line too long (112 > 90)
Probably you could add extend EXCLUDE_FILE_PATTERNS to thirdparty files in 
bin/jenkins/critique-gerrit-review.py.



--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 24
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Apr 2021 13:29:30 +0000
Gerrit-HasComments: Yes

Reply via email to