Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... Patch Set 12: (3 comments) I have some high level comments. I plan to go through the patch in more detail later. http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9 PS12, Line 9: This patch adds the logic to utilize min/max stats Does this patch also leads to utilizing min/max filters per-row, similarly to bloom filters? http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549 PS12, Line 549: if ( eval_min_max ) { I am wondering if it is possible to handle min/max runtime filters more similarly to existing stat filtering. A possible idea is to split the new filter do distinct data_min>join_max and data_max<join_min conjuncts. If I understand things correctly, the only extra step these would need compared to existing stat filtering is that a slot would have to be filled from the filter's min or max value. The advantage would be that fixes/hacks created for stat filtering would apply to both, e.g. TINYINT/SMALLINT/TIMESTAMP handling in https://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-stats.cc#L87 http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862 PS12, Line 862: TYPE_DATETIME You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impala -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 20 Nov 2020 22:33:18 +0000 Gerrit-HasComments: Yes