Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9 PS12, Line 9: This patch adds the logic to utilize min/max stats > Does this patch also leads to utilizing min/max filters per-row, similarly That is an interesting thought. I would think we shall get some ideas with performance testing and the collecting of overlapping information. min/max evaluation per row may be advantageous to string data as it may not need to go over every character in the string before finding an inequality. http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9 PS12, Line 9: This patch adds the logic to utilize min/max stats > I think this would be a good thing to do (I think the patch does this autom Yes, in which order is interesting. If we apply it on strings, min/max first probably makes sense. http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549 PS12, Line 549: if ( eval_min_max ) { > I am wondering if it is possible to handle min/max runtime filters more sim That seems a good idea, in that the new logic here can be moved over to the min/max filter itself (e.g. to a new method EvalOverLap()) so that other types of hdfs scanners (e.g., ORC) can benefit. It probably can also simplify things a little bit here. Let me take a look into it. http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862 PS12, Line 862: TYPE_DATETIME > You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impa Done -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 12 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 23 Nov 2020 15:28:14 +0000 Gerrit-HasComments: Yes