Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters ...................................................................... Patch Set 21: (2 comments) http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12 PS21, Line 12: join builders and Parquet scanners Is it fair to say that the main value proposition of the column min/max statistics is in the join builder ? The Parquet scanners already have access to the row group's min/max stats per column, so it seems to me the stats coming from HMS will not add additional value there. But for the HJ builder, yes it helps by figuring out whether after the build phase let's say you have the range [10, 50] and the min/max stats fetched from HMS are [60, 100] then we can quickly say that the runtime min/max filter will exclude all row groups. But what happens if the stats are out-of-date ? Since these stats are getting uses not just for ACID tables but for external tables as well, the stats are not bound to the table's valid write id (applicable only for ACID tables). This can lead to incorrect overlap calculation. Any thoughts ? http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221 PS21, Line 221: ---- QUERY For the tests, it would be good to add a 3 table join case as well. My understanding is that if you have a fact table F and 2 dimension tables D1, D2 and suppose F is joined to D1, D2 as follows: F. a1 = D1.a1 AND F.a1 = D2.a1 then suppose the first join's range is [10, 50] and the second join's range is [20, 60] then the filter seen at scan of F will be [20, 50]. Are such scenario already tested elsewhere ? If so, feel free to point me to those and mark this resolved. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 16 Mar 2021 03:07:30 +0000 Gerrit-HasComments: Yes