Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
......................................................................


Patch Set 45:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@652
PS45, Line 652:     minmax_filter->DecideAlwaysTrueForOverlap(col_type, 
min_slot, max_slot, threshold);
> I think this would disable it for all subsequent row groups across all thre
Done


http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@657
PS45, Line 657:             << ", columnType=" << col_type.DebugString()
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@659
PS45, Line 659:             << ", data max=" << GetIntTypeValue(col_type, 
max_slot)
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h@76
PS45, Line 76:     always_true_ = !(ComputeOverlapRatio(type, data_min, 
data_max) < threshold);
> Filters can be read/evaluated from multiple threads, so this will be flagge
Good point!

Created a local copy in HdfsParquetScanner as suggested.

Plan to keep keep the modified logic for alwaysTrue_ in (base class) min max 
filter to allow alwaysTrue_ to be set.

The use cases can be the following in hash join builder.

1. Too many data values have been inserted (say over a threshold of 1000);
2. Sub-ranges are not selective enough.


http://gerrit.cloudera.org:8080/#/c/16720/45/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/16720/45/tests/query_test/test_runtime_filters.py@267
PS45, Line 267: @SkipIfLocal.multiple_impalad
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 45
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 17:41:57 +0000
Gerrit-HasComments: Yes

Reply via email to