Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
......................................................................


Patch Set 12:

(3 comments)

I have some high level comments. I plan to go through the patch in more detail 
later.

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9
PS12, Line 9: This patch adds the logic to utilize min/max stats
Does this patch also leads to utilizing min/max filters per-row, similarly to 
bloom filters?


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549
PS12, Line 549:     if ( eval_min_max ) {
I am wondering if it is possible to handle min/max runtime filters more 
similarly to existing stat filtering.

A possible idea is to split the new filter do distinct data_min>join_max and 
data_max<join_min conjuncts. If I understand things correctly, the only extra 
step these would need compared to existing stat filtering is that a slot would 
have to be filled from the filter's min or max value.

The advantage would be that fixes/hacks created for stat filtering would apply 
to both, e.g. TINYINT/SMALLINT/TIMESTAMP handling in 
https://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-stats.cc#L87


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862
PS12, Line 862: TYPE_DATETIME
You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impala



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 22:33:18 +0000
Gerrit-HasComments: Yes

Reply via email to