Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
......................................................................


Patch Set 19:

(8 comments)

Did a quick walkthrough, will look into it in detail next week.

http://gerrit.cloudera.org:8080/#/c/17075/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17075/19//COMMIT_MSG@19
PS19, Line 19: show_column_minmax_stats
Do we need this query option? I mean if we have min/max stats then we'd 
probably want to show them.


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@447
PS19, Line 447: (
nit: parentheses are not needed as the '.' member access takes precedence over 
the '&' adress-of operator,


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@492
PS19, Line 492: (
nit: parentheses not needed


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/hdfs-scanner.h@348
PS19, Line 348:     uint8_t enabled_for_rowgroup;
Why do we need this flag? If enabled_for_rowgroup is false, then the min/max 
filter is completely turned off, right? In that case we shouldn't even send 
them to the scanner, or am I missing something?


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/parquet/hdfs-parquet-scanner.cc@662
PS19, Line 662:     }
nit: EvaluateOverlapForRowGroup() is already quite long, maybe this code could 
go into a separate function.


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc@950
PS19, Line 950:
nit: indentation


http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
File fe/src/main/java/org/apache/impala/catalog/ColumnStats.java:

http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@50
PS19, Line 50: LOG
seems unused


http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@278
PS19, Line 278:    */
It would be good to handle Iceberg tables that use Parquet data files.



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 17:02:09 +0000
Gerrit-HasComments: Yes

Reply via email to