Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
......................................................................


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12
PS21, Line 12: join builders and Parquet scanners
Is it fair to say that the main value proposition of the column  min/max 
statistics is in the join builder ?  The Parquet scanners already have access 
to the row group's min/max stats per column, so it seems to me the stats coming 
from HMS will not add additional value there. But for the HJ builder, yes it 
helps by figuring out whether after the build phase let's say you have the 
range [10, 50] and the min/max stats fetched from HMS are [60, 100] then we can 
quickly say that the runtime min/max filter will exclude all row groups.

But what happens if the stats are out-of-date ? Since these stats are getting 
uses not just for ACID tables but for external tables as well, the stats are 
not bound to the table's valid write id (applicable only for ACID tables). This 
can lead to incorrect overlap calculation. Any thoughts ?


http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221
PS21, Line 221: ---- QUERY
For the tests, it would be good to add a 3 table join case as well. My 
understanding is that if you have a fact table F and 2 dimension tables D1, D2 
and suppose F is joined to D1, D2 as follows:
 F. a1 = D1.a1 AND F.a1 = D2.a1  then suppose the first join's  range is  [10, 
50] and the second join's range is [20, 60]  then the filter seen at scan of F  
will be [20, 50].  Are such scenario already tested elsewhere ? If so, feel 
free to point me to those and mark this resolved.



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 03:07:30 +0000
Gerrit-HasComments: Yes

Reply via email to