[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Reviewed-on: http://gerrit.cloudera.org:8080/17821 Reviewed-by: Qifan Chen Tested-by: Impala Public Jenkins --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) Approvals: Qifan Chen: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 18 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 17:17:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > In some cases, FastLimitCheckExceededRows is 0, Yeah, the new code ignores the spilled partitions. Done. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 13:26:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7480/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 11:01:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 10:21:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9482/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:17:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:03:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > See my other comment. In some cases, FastLimitCheckExceededRows is 0, e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0; http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 1 > I see. What I mean is to assure the value is greater than 0. Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#16). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: (2 comments) Just like to double check on the expected test results. http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > I modified the unit test because autotest failed.FastLimitCheckExceededRows See my other comment. http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 0 > Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, the I see. What I mean is to assure the value is greater than 0. So maybe use [1-9][0-9]*? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 20 Sep 2021 16:13:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7477/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Sat, 18 Sep 2021 12:39:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > nit. 1? I modified the unit test because autotest failed.FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may contain 0 http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 0 > nit. Should it be 1-9 instead? Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, then check whether the number of rows returned more than limit, so FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may contain 0 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Sat, 18 Sep 2021 07:04:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9469/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Sat, 18 Sep 2021 06:33:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7477/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Sat, 18 Sep 2021 06:18:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#15). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/15 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7470/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 18:32:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: (2 comments) Looks super! Just a minor comment on the specs for skipped rows in two query tests. Otherwise, good for +2. http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: 0 nit. 1? http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 0 nit. Should it be 1-9 instead? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 13:27:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9456/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 12:30:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7470/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 12:12:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134 PS13, Line 134: runtime_profile_->AddInfoString("FastLimitCheckExceededRows", : SimpleItoa(aggs_[0]->GetNumKeys() - limit())); : VLOG_QUERY << S > If we can add the info to runtime_profile_, it will be more useful. For exa Done http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446 PS13, Line 446: GINT > Can we also verify that some rows are indeed skipped in spill situation? Done http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726 PS13, Line 2726: speed up aggregations > Can we verify that most of the rows are indeed skipped fast? Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 12:08:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#14). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 149 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/14 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: (4 comments) I think if we can improve the observability a little bit, it will be great. http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134 PS13, Line 134: VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() << ") returned" : " from the streaming aggregation node has exceeded the limit of " : << limit(); If we can add the info to runtime_profile_, it will be more useful. For example, to verify that the feature is able to kick in in query tests. runtime_profile_->AddInfoString("Hdfs Read Thread Concurrency Bucket", ss.str()); http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > where id = subquery,If this subQuery returns 2 rows, we can sure that it is Okay. Looks this is a badly written query when it returns more one row. My fault. The following version runs fine on my box and I suppose your new feature should not kick in. select * from functional.alltypes where id in (select i from (select bigint_col as i from functional.alltypes union select tinyint_col as i from functional.alltypes) t ) ; http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446 PS13, Line 446: Verify Can we also verify that some rows are indeed skipped in spill situation? http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726 PS13, Line 2726: speed up aggregations Can we verify that most of the rows are indeed skipped fast? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 15:15:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9452/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 07:01:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: (6 comments) http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10 PS12, Line 10: > nit. exceed the max line threshold. Done http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76 PS12, Line 76: VLOG_QUERY << Substitute("the number of rows ($0) returned from the aggregation" : " node has exceeded the limit of $1", aggs_[0]->GetNumKeys(), limit()); > nit. May use Substitute() which is faster. Done http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479 PS12, Line 479: compl > nit. complete Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646 PS12, Line 646: When both conditions below are true, aggregati > nit. I think we should mention the two conditions in the commit message her Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648 PS12, Line 648: on n > nit. May use Complete, as Halt implies stop for some reason. Done http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > Since we do not push down id from the outer side to the inner, I would thin where id = subquery,If this subQuery returns 2 rows, we can sure that it is not meet the semantic requirement, we should report a semantic error.We don't need to get all the results. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#13). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 18 files changed, 142 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 12: (6 comments) Looks great. The comment on subquery rewrite test is orthogonal to this patch. http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10 PS12, Line 10: pre-aggregation nit. exceed the max line threshold. http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76 PS12, Line 76: VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() << ") returned" : " from the aggregation node has exceeded the limit of " << limit(); nit. May use Substitute() which is faster. Example: orc-column-readers.cc: return Substitute("$0 column (ORC id=$1)", node->toString(), node->getColumnId()); sort-node.cc: VLOG(3) << Substitute("Sort estimation: estimated_full_input_size=$0 " http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479 PS12, Line 479: halt nit. complete http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646 PS12, Line 646: For SELECT DISTINCT f1,f2,...fn FROM t LIMIT n nit. I think we should mention the two conditions in the commit message here. The above SQL is an example. http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648 PS12, Line 648: Halt nit. May use Complete, as Halt implies stop for some reason. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a Since we do not push down id from the outer side to the inner, I would think we need to have ALL distinct values from the union side. If limit 2 is for the # of groups (i.e., taking only 2 groups, or 2 distinct values), then is it right? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 10 Sep 2021 13:40:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9448/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 10 Sep 2021 10:01:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 12: (8 comments) http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10 PS11, Line 10: When both con > nit. When both conditions below are true, Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h File be/src/exec/aggregation-node-base.h: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71 PS11, Line 71: bool fast_limit_check_ = false; > nit. May init to false here. Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93 PS11, Line 93: : Status AggregationNodeBase::Prepare(RuntimeState* state) { : SCOPED_TIMER(runtime_profile_->total_time_counter()); : RETURN_IF_ERROR(ExecNode::Prepare(state)); : for (auto& agg : aggs_) { : agg->SetDebugOptions > I also wonder if fast_limit_check_ can be moved to the plan node for Aggreg Done Only GroupingAggregator has limit, maybe limit should not be added to TAggregator http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75 PS11, Line 75: os = true; : VLOG_QUERY << "the > nit. May make the message more specific by plugging in the numbers. For exa Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129 PS11, Line 129: DCHECK(limit() > -1); > nit. We probably should DCHECK(limit() > -1) here before the IF. Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133 PS11, Line 133: child_batch_->Reset(); : VLOG_QUERY << "the > nit. same as the other one to make the message more specific. Done http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > nit. I wonder if this correct. On paper, we need to look at all distinct va This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a scalar subquery, the number of rows returned by this subquery cannot be greater than 1, so set limit = 2, and then check cardinality of result of subquery http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731 PS11, Line 2731: > This violates condition 1? Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 10 Sep 2021 09:39:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#12). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 18 files changed, 140 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/12 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 11: (8 comments) Looks pretty good. http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10 PS11, Line 10: In some cases nit. When both conditions below are true, http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h File be/src/exec/aggregation-node-base.h: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71 PS11, Line 71: bool fast_limit_check_; nit. May init to false here. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93 PS11, Line 93: if (num_aggs == 1 && limit() != -1 && pnode.aggs_[0]->GetNumGroupingExprs() > 0 : && pnode.aggs_[0]->aggregate_functions_.empty() : && pnode.aggs_[0]->conjuncts_.empty()) { : fast_limit_check_ = true; : GroupingAggregator* gpAgg = dynamic_cast(aggs_[0].get()); : gpAgg->UnsetLimit(); I also wonder if fast_limit_check_ can be moved to the plan node for Aggregation and set by FE. The same comment goes to limit. That is, set the limit in TAggregator in FE. In this way, the solution is cleaner. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75 PS11, Line 75: LOG_QUERY << "the number of rows returned by agg node has exceeded" : " the limit."; nit. May make the message more specific by plugging in the numbers. For example, the number of rows (3) returned from the aggregation node has exceeded the limit of 2. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129 PS11, Line 129: if (aggs_[0]->GetNumKeys() >= limit()) { nit. We probably should DCHECK(limit() > -1) here before the IF. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133 PS11, Line 133: VLOG_QUERY << "the number of rows returned by streaming agg node has exceeded" : " the limit."; nit. same as the other one to make the message more specific. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 nit. I wonder if this correct. On paper, we need to look at all distinct values (via grouping) to make a decision. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731 PS11, Line 2731: count(*) This violates condition 1? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 11 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 09 Sep 2021 15:08:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#11). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 129 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/11 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 11 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 02 Sep 2021 15:50:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7452/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 02 Sep 2021 09:37:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#10). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 129 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/10 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#3). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc@91 PS2, Line 91: // node has exceeded the limit we can complete the query without calculating all the data line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h@217 PS2, Line 217: void UnsetLimit() { limit_ = -1; } line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:26:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:25:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17821 Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 127 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao