[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits IMPALA-2581 added enforcement of the limit when adding entries to the grouping aggregation. It would stop adding new entries if the number of entries in the grouping aggregation was >= the limit. If the grouping aggregation never contains more entries than the limit, then it would not output more entries. However, this limit was not enforced exactly when adding. It would add a whole batch before checking the limit, so it can go past the limit. In practice the exchange in a distributed aggregation would enforce limits, so this would only show up when num_nodes=1. As a result, the following query incorrectly returns 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; One option is to be exact when adding items to the group aggregation, which would require testing the limit on each row (we don't know which are duplicates). This is awkward. Removing the limit on the output of the aggregation also is not really needed for the original change (stopping the children early once the limit is reached). Instead, we restore the limit on the output of the grouping agg (which is already known to work). Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Reviewed-on: http://gerrit.cloudera.org:8080/20379 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell --- M be/src/exec/aggregation-node-base.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 31 insertions(+), 22 deletions(-) Approvals: Impala Public Jenkins: Verified Joe McDonnell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 11 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 11 Sep 2023 16:59:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 10: Code-Review+1 This looks good to me. +1'ing to let others take a look, but I'm willing to go to +2. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 08 Sep 2023 03:07:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 23:19:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9684/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 18:59:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22 PS9, Line 22: using fast_limit_check, these should essentially be a no-op. > I rewrote the commit message. Do you think mentioning this is still needed? No, your new message is great, no need to mention this. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 17:39:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@9 PS9, Line 9: IMPALA-2581 added enforcement of the limit when adding entries to the : grouping aggregation. It would stop adding new entries if the number of : entries in the grouping aggregation was >= the limit. If the grouping : aggregation never contains more entries than the limit, then it would : not output more > Here's what I think the thought process was: Ack http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22 PS9, Line 22: select distinct l_orderkey from tpch.lineitem limit 10; > Nit: For a distributed plan, it won't impact the query result, but it does I rewrote the commit message. Do you think mentioning this is still needed? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 17:19:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#10). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits IMPALA-2581 added enforcement of the limit when adding entries to the grouping aggregation. It would stop adding new entries if the number of entries in the grouping aggregation was >= the limit. If the grouping aggregation never contains more entries than the limit, then it would not output more entries. However, this limit was not enforced exactly when adding. It would add a whole batch before checking the limit, so it can go past the limit. In practice the exchange in a distributed aggregation would enforce limits, so this would only show up when num_nodes=1. As a result, the following query incorrectly returns 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; One option is to be exact when adding items to the group aggregation, which would require testing the limit on each row (we don't know which are duplicates). This is awkward. Removing the limit on the output of the aggregation also is not really needed for the original change (stopping the children early once the limit is reached). Instead, we restore the limit on the output of the grouping agg (which is already known to work). Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 31 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/10 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22 PS9, Line 22: using fast_limit_check, these should essentially be a no-op. Nit: For a distributed plan, it won't impact the query result, but it does impact the number of rows returned by the grouping aggregator to conform to the limit. So, can we say that it has no impact on the query result rather than saying it is a no-op? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 17:17:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@9 PS9, Line 9: When IMPALA-2581 was implemented, it assumed all aggregation nodes would : have a pre-aggregation step that limits could be pushed to and therefore : removed limits on the final aggregation step. All distributed : aggregations have an exchange, and in practice the exchange would : enforce limits. Here's what I think the thought process was: IMPALA-2581 added enforcement of the limit when adding entries to the grouping aggregation. It would stop adding new entries if the number of entries in the grouping aggregation was >= the limit. If the grouping aggregation never contains more entries than the limit, then it would not output more entries. However, this limit was not enforced exactly when adding. It would add a whole batch before checking the limit, so it can go past the limit. One option is to be exact when adding items to the group agg, which would require testing the limit on each row (we don't know which are duplicates). This is awkward. Removing the limit on the output of the aggregation also is not really needed for the original change (stopping the children early once the limit is reached). Instead, we restore the limit on the output of the grouping agg (which is already known to work). -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 16:34:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@21 PS9, Line 21: This fix restores limits on the aggregation node; > nit. This fix properly configures the aggregate node to avoid early complet Is that update true? Early completion comes on the streaming sender side of aggregation, which still happens (if a streaming sender exists). In a distributed plan, the exchange also enforced the limit. This patch restores the limit on the aggregation node, which is also used to end once the limit is reached. I don't see how that "avoids early completion". http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a196 PS9, Line 196: > nit. noEarlyCompletion better describes the semantics. It's been removed, so this no longer exists. http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a218 PS9, Line 218: > nit. Can we use setNoEarlyCompletion() instead? This function was removed. It was originally added in https://gerrit.cloudera.org/c/17821/18/be/src/exec/grouping-aggregator.h#218. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 16:22:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: Code-Review+1 (4 comments) Looks great. http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@21 PS9, Line 21: This fix restores limits on the aggregation node; nit. This fix properly configures the aggregate node to avoid early completion, via the existent flag 'fast_limit_check', when num_nodes=1. In a true distributed plan, the fix does not alter the existing semantics. http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a196 PS9, Line 196: nit. noEarlyCompletion better describes the semantics. http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a218 PS9, Line 218: nit. Can we use setNoEarlyCompletion() instead? http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: > Joe asked why https://github.com/apache/impala/blob/master/be/src/exec/grou Good finding Joe. I love the reworked fix Michael. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 07 Sep 2023 13:24:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13936/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 06 Sep 2023 19:20:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#9). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to and therefore removed limits on the final aggregation step. All distributed aggregations have an exchange, and in practice the exchange would enforce limits. When num_nodes=1, that is not the case. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix restores limits on the aggregation node; in a distributed plan using fast_limit_check, these should essentially be a no-op. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 31 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/9 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: > The original version of this method deals with only logical properties. The Joe asked why https://github.com/apache/impala/blob/master/be/src/exec/grouping-aggregator.cc#L156 is set in the first place. That prevents the aggregation node from enforcing a limit, which seems to be the problem here. Removing UnsetLimit would fix this issue, and doesn't seem like it should cause problems since we want the result of the aggregation to be limited. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 06 Sep 2023 17:43:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: > What's is_serial_plan? I don't understand what case that's supposed to cove The original version of this method deals with only logical properties. The modified version adds physical properties, in particular the inclusion of the only distributed/parallel version of the aggregate operator. As a result, the the serial plan version is excluded. If we can test the serial plan version here, then we will not exclude a piece of functionality existed before. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 28 Aug 2023 13:06:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13844/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 17:09:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#8). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 49 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/8 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777 PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_; > nit. May add a comment: useIntermediateTuple_ is set to true for any non m Done http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > Okay. The test on useIntermediateTuple_ resolves my concern :-). I think I What's is_serial_plan? I don't understand what case that's supposed to cover. The link just points to where canCompleteEarly is called. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 16:39:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (2 comments) Looks good and I think we are converge toward +2. http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777 PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_; nit. May add a comment: useIntermediateTuple_ is set to true for any non merge nodes and endsMultiPhase_ is true for a merge node. http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > Unless I misunderstood your example, we can't sort on partition keys Okay. The test on useIntermediateTuple_ resolves my concern :-). I think I am okay with the change for the distributed plans. In a serial plan, per https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L475, the aggregation node may lost the capability to bail out early. Can we mend it somehow as follows? return isSingleClassAgg() && hasLimit() && hasGrouping() && (is_serial_plan || isMultiPhase()) && !multiAggInfo_.hasAggregateExprs() && getConjuncts().isEmpty(); -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 00:35:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (1 comment) > Patch Set 7: > > (1 comment) Unless I misunderstood your example, we can't sort on partition keys > create table foo (s string) partitioned by (a int, b int) sort by (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() Unless I misunderstood your example, we can't sort on partition keys > default> create table foo (s string) partitioned by (a int, b int) sort by > (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' If you go through and examine every AggregationNode we create in DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so this should have no functional difference with DistributedPlanner. The only case this should have any impact is when SingleNodePlanner creates a single AggregationNode, and there's nothing to push down to. When I look at examples for a similar case > create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d); where I've created 3 partitions and run > select distinct a, b from foo limit 2 the plans are unchanged with and without this patch Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - F02:ROOT 1 1 323.181us 323.181us 4.01 MB4.00 MB 04:EXCHANGE1 1 12.331us 12.331us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER3 3 34.995us 43.647us 14.25 KB 48.00 KB 03:AGGREGATE 3 3 407.997us 513.288us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE3 3 10.555us 14.924us 3 2 16.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER3 3 51.210us 60.198us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 157.104us 183.497us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 31.612ms1.962ms 3 2 32.00 KB 32.00 MB default.foo Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - F02:ROOT 1 1 64.470us 64.470us 4.01 MB4.00 MB 04:EXCHANGE1 19.816us9.816us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER3 3 22.339us 23.764us 14.28 KB 48.00 KB 03:AGGREGATE 3 3 290.786us 323.453us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE3 3 16.727us 31.736us 3 2 24.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER3 3 62.719us 69.838us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 179.794us 196.761us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 3 40.368ms 42.799ms 3 2 32.00 KB 32.00 MB default.foo Same for SingleNodePlanner. Same results with > select distinct c, d from foo limit 2 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > isMultiPhase encompasses all nodes that are part of a chain of aggregation I see. Can you perform a simple performance test to see if this would negatively affect queries that a very small subset of non-merge aggregate nodes can provide the answer? For example, let us partition table T on column a, b into 10 partitions and sorted on a, b. The query is select distinct a, b from T limit 2. Normally, such query can finish as soon as two smallest subsets of rows (on a, b) are read in. By reading the code here, my understand is that with the change we can not complete early until on all read nodes (from 10 partitions) are done the work and we can complete early only at the very top merge node is active. True? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 20:46:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#7). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. - passed an exhaustive test run. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 47 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/7 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9631/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 24 Aug 2023 16:33:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13814/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 19:10:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13813/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 19:08:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#6). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 47 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/6 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@83 PS4, Line 83: finalize > nit node? Done http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@172 PS4, Line 172: Preconditions.checkState(needsFinalize_ && !endsMultiPhase_); > && !endsMultiPhase_ Done http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() > I wonder if this effectively requires all non-root aggregation nodes (i.e., isMultiPhase encompasses all nodes that are part of a chain of aggregation nodes. Most of them useIntermediateTuple_. So this change should not introduce any behavioral changes in DistributedPlanner, and only affects SingleNodePlanner for simple cases (no aggregate expressions or conjuncts). Is there a particular case it would be useful to test? It might be possible to fix this by counting. It would require adding additional logic to double-check and possibly truncate results coming from aggregating individual partitions. Fixing one case that was missed in the planning stage seemed more straight-forward. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 18:42:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#5). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores definition of ALL_CLUSTER_SIZES and makes it simpler to enable for individual test suites. Filed IMPALA-12394 to generally re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES for aggregation tests. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 6 files changed, 46 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/5 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py@168 PS5, Line 168: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py@168 PS5, Line 168: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/test_dimensions.py@215 PS5, Line 215: def create_exec_option_dimension(cluster_sizes=ALL_NODES_ONLY, flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 18:43:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 4: (4 comments) The change looks great. Just have one question on performance. http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85 PS1, Line 85: endsMultiPhase_ = f > Looked into this a bit more. Most of those paths don't use pushdown because Done http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@83 PS4, Line 83: finalize nit node? http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@172 PS4, Line 172: Preconditions.checkState(needsFinalize_); && !endsMultiPhase_ http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() I wonder if this effectively requires all non-root aggregation nodes (i.e., their endsMultiPhase_ is false) to complete their work first, prior to the merge aggregate node. If so, this may cause performance degradation. Is it possible to fix this bug in backend by counting number of values x coming up at the merge aggregate and bailing out of the loop when x >= n? -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 18:11:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/4/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/20379/4/tests/common/test_dimensions.py@a199 PS4, Line 199: This has been here so long that I might need to re-enable it gradually. Currently 214 test failures in an exhaustive run to look through. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 18:03:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/20379/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777 PS2, Line 777: return useIntermediateTuple_ || endsMultiPhase_; >From my inspection, useStreamingPreagg_ is a subset of useIntermediateTuple_. >So removing it should be fine. http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1131 PS4, Line 1131: agg.setEndsMultiPhase(); This might be causing a bunch of num_nodes=1 test failures. Looking into it. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 22 Aug 2023 17:40:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13794/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 21:34:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13793/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 21:33:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 4: (2 comments) Used https://gerrit.cloudera.org/c/17821/18/testdata/workloads/targeted-perf/queries/aggregation.test to verify that FastLimitCheck still happens for DistributedPlanner. http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85 PS1, Line 85: endsMultiPhase_ = f > It's not. useStreamingPreagg is only set on the pre-aggregation step. Looked into this a bit more. Most of those paths don't use pushdown because they have conjuncts and/or aggregate expressions. Mainly seems to affect simple select distinct/group by queries. I added one other case to the test, but it wasn't failing before. http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py@405 PS3, Line 405: ( > flake8: E131 continuation line unaligned for hanging indent Done -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 21:14:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#4). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores running with num_nodes=0 and num_nodes=1 for default test dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter tests with num_nodes=1 due to finstance expectations. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py M tests/query_test/test_runtime_filters.py 6 files changed, 39 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/4 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#3). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores running with num_nodes=0 and num_nodes=1 for default test dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter tests with num_nodes=1 due to finstance expectations. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py M tests/query_test/test_runtime_filters.py 6 files changed, 39 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/3 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py@405 PS3, Line 405: " flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 21:08:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 19:30:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@12 PS1, Line 12: i > would incorrectly return Done http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@11 PS1, Line 11: As a : result, > nit. As a result, Done http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@17 PS1, Line 17: This fix i > nit. This fix identifies Done -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 19:05:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20379 to look at the new patch set (#2). Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. As a result, the following query would incorrectly return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; This fix identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores running with num_nodes=0 and num_nodes=1 for default test dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter tests with num_nodes=1 due to finstance expectations. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py M tests/query_test/test_runtime_filters.py 5 files changed, 30 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/2 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85 PS1, Line 85: hasStreamingPreagg_ > I wonder if this new variable is logically equivalent to the following It's not. useStreamingPreagg is only set on the pre-aggregation step. All these flags are difficult to reason about though. I'm going to spend some time with them; there are 3 places we create aggregation nodes: - SingleNodePlanner#createAggregationPlan, which can add distinct and transpose phases - DistributedPlanner#createMergeAggregationFragment - DistributedPlanner#createPhase2DistinctAggregationFragment I think I need to understand how to generate all these and make sure we have test coverage. Maybe I can come up with a clearer way to describe the different use cases. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 Aug 2023 17:32:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 1: (4 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@11 PS1, Line 11: For : example nit. As a result, http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@12 PS1, Line 12: would incorrectly return http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@17 PS1, Line 17: Identifies nit. This fix identifies http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85 PS1, Line 85: hasStreamingPreagg_ I wonder if this new variable is logically equivalent to the following needsFinalize_ && useStreamingPreagg_ -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 Aug 2023 16:23:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 ) Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13780/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 18 Aug 2023 19:10:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20379 Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits .. IMPALA-12383: Fix SingleNodePlanner aggregation limits When IMPALA-2581 was implemented, it assumed all aggregation nodes would have a pre-aggregation step that limits could be pushed to. That's not the case when using SingleNodePlanner, such as when num_nodes=1. For example, the following query would return 16 rows, not 10: set num_nodes=1; select distinct l_orderkey from tpch.lineitem limit 10; Identifies all aggregation nodes that use pre-aggregation so we use fast_limit_check in only those cases. Testing: - added a test case where we assert number of rows returned by an aggregation node (rather than an exchange or top-n). - restores running with num_nodes=0 and num_nodes=1 for default test dimensions; IMPALA-561 was fixed ages ago. Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M tests/common/test_dimensions.py M tests/query_test/test_aggregation.py 4 files changed, 27 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/1 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith