Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
......................................................................


Patch Set 46:

(13 comments)

Left some comments, but the code LGTM overall.
Though I think I'll need to go over it a few more times.

http://gerrit.cloudera.org:8080/#/c/16720/46//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/46//COMMIT_MSG@54
PS46, Line 54: 2
nit: 3


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/exec/parquet/hdfs-parquet-scanner.h@591
PS46, Line 591: &
nit: for output values we usually use pointer types.


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/runtime/date-value.cc
File be/src/runtime/date-value.cc:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/runtime/date-value.cc@370
PS46, Line 370: days_since_epoch_ - days
nit: this expression is used 3 times in this function. Probably worth to 
extract it to a variable.


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/runtime/timestamp-value.cc@226
PS46, Line 226: date_ + boost::gregorian::date_duration(1),
This doesn't work well if time duration is greater than a day.


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter-ir.cc@32
PS46, Line 32: else {
I think we shouldn't have this 'else', because if we only insert a single 
element into the min/max filter then we want to set both min_ and max_.


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.h@118
PS46, Line 118: max(filter_min, data_min) - min(filter_max, data_max) + 1
It's reversed, it should be

 min(filter_max, data_max) - max(filter_min, data_min) + 1


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.cc@722
PS46, Line 722:         /* If the filter completely covers the data range, 
return 1.0*/
              :         if (filter_min <= data_min && data_max <= filter_max) {
              :           return 1.0;
              :         }
              :         double overlap_min = std::max(filter_min, data_min);
              :         double overlap_max = std::min(filter_max, data_max);
              :         return (float)((overlap_max - overlap_min + 1) / 
(data_max - data_min + 1));
This is the same for all the 3 branches, so it could be moved after the switch 
stmt.


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.cc@735
PS46, Line 735: min4_
min8_?


http://gerrit.cloudera.org:8080/#/c/16720/46/be/src/util/min-max-filter.cc@749
PS46, Line 749: min4_
min16_?


http://gerrit.cloudera.org:8080/#/c/16720/46/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/16720/46/common/thrift/ImpalaService.thrift@623
PS46, Line 623: greater than 0 to enable
You could mention that we can only set it to a value between 0.0 and 1.0, and 
1.0 means that the filter is always evaluated.


http://gerrit.cloudera.org:8080/#/c/16720/46/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16720/46/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@686
PS46, Line 686: overla
nit: overlap


http://gerrit.cloudera.org:8080/#/c/16720/46/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@732
PS46, Line 732: difficult
nit: word 'difficult' is redundant


http://gerrit.cloudera.org:8080/#/c/16720/46/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test:

http://gerrit.cloudera.org:8080/#/c/16720/46/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test@a42
PS46, Line 42: 
Why are the distributed plans being removed?



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 46
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Jan 2021 16:14:22 +0000
Gerrit-HasComments: Yes

Reply via email to