Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... Patch Set 17: (3 comments) http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG@21 PS16, Line 21: evaluted nit: evaluated http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549 PS12, Line 549: int64_t tuple_size = min_max_tuple_desc->byte_size(); > That seems a good idea, in that the new logic here can be moved over to the Was the idea to whenever a min/max filter arrives, we could extend the min_max_tuple_ with a pair of slots (min and max filter value) and min_max_conjunct_evals_ with two new predicates (filter_min <= data and filter_max >= data)? Creating the slot descriptors dynamically can be cumbersome, or maybe we could just create the descriptors in advance like we already do AFAICT, and only evaluate the conjuncts that has their filter arrived? I think it's an interesting idea, worth to investigate this direction. It could probably simplify the code a lot because we'd get row group-level, page-level, and row-level filtering for free. http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@493 PS17, Line 493: std::pair<Status, ColumnStatsReader> Nit: I think we usually return multiple values in output parameters, and the return value is only Status. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 25 Nov 2020 18:11:07 +0000 Gerrit-HasComments: Yes