Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
......................................................................


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG@21
PS16, Line 21: evaluted
nit: evaluated


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549
PS12, Line 549:   int64_t tuple_size = min_max_tuple_desc->byte_size();
> That seems a good idea, in that the new logic here can be moved over to the
Was the idea to whenever a min/max filter arrives, we could extend the 
min_max_tuple_ with a pair of slots (min and max filter value) and 
min_max_conjunct_evals_ with two new predicates (filter_min <= data and 
filter_max >= data)?

Creating the slot descriptors dynamically can be cumbersome, or maybe we could 
just create the descriptors in advance like we already do AFAICT, and only 
evaluate the conjuncts that has their filter arrived?

I think it's an interesting idea, worth to investigate this direction. It could 
probably simplify the code a lot because we'd get row group-level, page-level, 
and row-level filtering for free.


http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@493
PS17, Line 493: std::pair<Status, ColumnStatsReader>
Nit: I think we usually return multiple values in output parameters, and the 
return value is only Status.



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Nov 2020 18:11:07 +0000
Gerrit-HasComments: Yes

Reply via email to