[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-11 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

IMPALA-2581 added enforcement of the limit when adding entries to the
grouping aggregation. It would stop adding new entries if the number of
entries in the grouping aggregation was >= the limit. If the grouping
aggregation never contains more entries than the limit, then it would
not output more entries.

However, this limit was not enforced exactly when adding. It would add a
whole batch before checking the limit, so it can go past the limit. In
practice the exchange in a distributed aggregation would enforce limits,
so this would only show up when num_nodes=1. As a result, the following
query incorrectly returns 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

One option is to be exact when adding items to the group aggregation,
which would require testing the limit on each row (we don't know which
are duplicates). This is awkward. Removing the limit on the output of
the aggregation also is not really needed for the original change
(stopping the children early once the limit is reached). Instead, we
restore the limit on the output of the grouping agg (which is already
known to work).

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Reviewed-on: http://gerrit.cloudera.org:8080/20379
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 31 insertions(+), 22 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-11 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 11 Sep 2023 16:59:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 10: Code-Review+1

This looks good to me. +1'ing to let others take a look, but I'm willing to go 
to +2.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 08 Sep 2023 03:07:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 23:19:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9684/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 18:59:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22
PS9, Line 22: using fast_limit_check, these should essentially be a no-op.
> I rewrote the commit message. Do you think mentioning this is still needed?
No, your new message is great, no need to mention this.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 17:39:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@9
PS9, Line 9: IMPALA-2581 added enforcement of the limit when adding entries to 
the
   : grouping aggregation. It would stop adding new entries if the 
number of
   : entries in the grouping aggregation was >= the limit. If the 
grouping
   : aggregation never contains more entries than the limit, then it 
would
   : not output more
> Here's what I think the thought process was:
Ack


http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22
PS9, Line 22:   select distinct l_orderkey from tpch.lineitem limit 10;
> Nit: For a distributed plan, it won't impact the query result, but it does
I rewrote the commit message. Do you think mentioning this is still needed?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 17:19:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#10).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

IMPALA-2581 added enforcement of the limit when adding entries to the
grouping aggregation. It would stop adding new entries if the number of
entries in the grouping aggregation was >= the limit. If the grouping
aggregation never contains more entries than the limit, then it would
not output more entries.

However, this limit was not enforced exactly when adding. It would add a
whole batch before checking the limit, so it can go past the limit. In
practice the exchange in a distributed aggregation would enforce limits,
so this would only show up when num_nodes=1. As a result, the following
query incorrectly returns 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

One option is to be exact when adding items to the group aggregation,
which would require testing the limit on each row (we don't know which
are duplicates). This is awkward. Removing the limit on the output of
the aggregation also is not really needed for the original change
(stopping the children early once the limit is reached). Instead, we
restore the limit on the output of the grouping agg (which is already
known to work).

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 31 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/10
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@22
PS9, Line 22: using fast_limit_check, these should essentially be a no-op.
Nit: For a distributed plan, it won't impact the query result, but it does 
impact the number of rows returned by the grouping aggregator to conform to the 
limit. So, can we say that it has no impact on the query result rather than 
saying it is a no-op?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 17:17:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@9
PS9, Line 9: When IMPALA-2581 was implemented, it assumed all aggregation nodes 
would
   : have a pre-aggregation step that limits could be pushed to and 
therefore
   : removed limits on the final aggregation step. All distributed
   : aggregations have an exchange, and in practice the exchange would
   : enforce limits.
Here's what I think the thought process was:
IMPALA-2581 added enforcement of the limit when adding entries to the grouping 
aggregation. It would stop adding new entries if the number of entries in the 
grouping aggregation was >= the limit. If the grouping aggregation never 
contains more entries than the limit, then it would not output more entries.

However, this limit was not enforced exactly when adding. It would add a whole 
batch before checking the limit, so it can go past the limit.

One option is to be exact when adding items to the group agg, which would 
require testing the limit on each row (we don't know which are duplicates). 
This is awkward. Removing the limit on the output of the aggregation also is 
not really needed for the original change (stopping the children early once the 
limit is reached).

Instead, we restore the limit on the output of the grouping agg (which is 
already known to work).



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 16:34:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@21
PS9, Line 21: This fix restores limits on the aggregation node;
> nit. This fix properly configures the aggregate node to avoid early complet
Is that update true?

Early completion comes on the streaming sender side of aggregation, which still 
happens (if a streaming sender exists). In a distributed plan, the exchange 
also enforced the limit. This patch restores the limit on the aggregation node, 
which is also used to end once the limit is reached. I don't see how that 
"avoids early completion".


http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a196
PS9, Line 196:
> nit. noEarlyCompletion better describes the semantics.
It's been removed, so this no longer exists.


http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a218
PS9, Line 218:
> nit. Can we use setNoEarlyCompletion() instead?
This function was removed. It was originally added in 
https://gerrit.cloudera.org/c/17821/18/be/src/exec/grouping-aggregator.h#218.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 16:22:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9: Code-Review+1

(4 comments)

Looks great.

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@21
PS9, Line 21: This fix restores limits on the aggregation node;
nit. This fix properly configures the aggregate node to avoid early completion, 
via the existent flag 'fast_limit_check', when num_nodes=1.  In a true 
distributed plan, the fix does not alter the existing semantics.


http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a196
PS9, Line 196:
nit. noEarlyCompletion better describes the semantics.


http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a218
PS9, Line 218:
nit. Can we use setNoEarlyCompletion() instead?


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786:
> Joe asked why https://github.com/apache/impala/blob/master/be/src/exec/grou
Good finding Joe. I love the reworked fix Michael.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Sep 2023 13:24:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13936/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 06 Sep 2023 19:20:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-06 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#9).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to and therefore
removed limits on the final aggregation step. All distributed
aggregations have an exchange, and in practice the exchange would
enforce limits.

When num_nodes=1, that is not the case. As a result, the following query
would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix restores limits on the aggregation node; in a distributed plan
using fast_limit_check, these should essentially be a no-op.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 31 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/9
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-06 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786:
> The original version of this method deals with only logical properties. The
Joe asked why 
https://github.com/apache/impala/blob/master/be/src/exec/grouping-aggregator.cc#L156
 is set in the first place. That prevents the aggregation node from enforcing a 
limit, which seems to be the problem here. Removing UnsetLimit would fix this 
issue, and doesn't seem like it should cause problems since we want the result 
of the aggregation to be limited.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 06 Sep 2023 17:43:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-28 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786:
> What's is_serial_plan? I don't understand what case that's supposed to cove
The original version of this method deals with only logical properties. The 
modified version adds physical properties, in particular the inclusion of the 
only distributed/parallel version of the aggregate operator.  As a result, the 
the serial plan version is excluded.

If we can test the serial plan version here, then we will not exclude a piece 
of functionality existed before.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 28 Aug 2023 13:06:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13844/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 17:09:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-25 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#8).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 49 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/8
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-25 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777
PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_;
> nit. May add a comment:  useIntermediateTuple_ is set to true for any non m
Done


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> Okay. The test on useIntermediateTuple_ resolves my concern :-).  I think I
What's is_serial_plan? I don't understand what case that's supposed to cover. 
The link just points to where canCompleteEarly is called.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 16:39:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(2 comments)

Looks good and I think we are converge toward +2.

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777
PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_;
nit. May add a comment:  useIntermediateTuple_ is set to true for any non merge 
nodes and endsMultiPhase_ is true for a merge node.


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> Unless I misunderstood your example, we can't sort on partition keys
Okay. The test on useIntermediateTuple_ resolves my concern :-).  I think I am 
okay with the change for the distributed plans.

In a serial plan, per 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L475,
 the aggregation node may lost the capability to bail out early. Can we mend it 
somehow as follows?

return isSingleClassAgg() && hasLimit() && hasGrouping() && (is_serial_plan || 
isMultiPhase())
   && !multiAggInfo_.hasAggregateExprs() && getConjuncts().isEmpty();



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(1 comment)

> Patch Set 7:
>
> (1 comment)

Unless I misunderstood your example, we can't sort on partition keys

> create table foo (s string) partitioned by (a int, b int) sort by (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
Unless I misunderstood your example, we can't sort on partition keys

> default> create table foo (s string) partitioned by (a int, b int) sort by 
> (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

If you go through and examine every AggregationNode we create in 
DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so 
this should have no functional difference with DistributedPlanner. The only 
case this should have any impact is when SingleNodePlanner creates a single 
AggregationNode, and there's nothing to push down to.

When I look at examples for a similar case

> create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d);

where I've created 3 partitions and run

> select distinct a, b from foo limit 2

the plans are unchanged with and without this patch

Operator  #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail

-
F02:ROOT   1  1  323.181us  323.181us   
   4.01 MB4.00 MB 
04:EXCHANGE1  1   12.331us   12.331us  2   
2  24.00 KB   16.00 KB  UNPARTITIONED
F01:EXCHANGE SENDER3  3   34.995us   43.647us   
  14.25 KB   48.00 KB
03:AGGREGATE   3  3  407.997us  513.288us  3   
2   2.08 MB   10.00 MB  FINALIZE
02:EXCHANGE3  3   10.555us   14.924us  3   
2  16.00 KB   16.00 KB  HASH(a,b) 
F00:EXCHANGE SENDER3  3   51.210us   60.198us   
  46.69 KB  144.00 KB
01:AGGREGATE   3  3  157.104us  183.497us  3   
2   2.03 MB   10.00 MB  STREAMING
00:SCAN HDFS   3  31.612ms1.962ms  3   
2  32.00 KB   32.00 MB  default.foo

Operator  #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail

-
F02:ROOT   1  1   64.470us   64.470us   
   4.01 MB4.00 MB
04:EXCHANGE1  19.816us9.816us  2   
2  24.00 KB   16.00 KB  UNPARTITIONED
F01:EXCHANGE SENDER3  3   22.339us   23.764us   
  14.28 KB   48.00 KB
03:AGGREGATE   3  3  290.786us  323.453us  3   
2   2.08 MB   10.00 MB  FINALIZE
02:EXCHANGE3  3   16.727us   31.736us  3   
2  24.00 KB   16.00 KB  HASH(a,b)
F00:EXCHANGE SENDER3  3   62.719us   69.838us   
  46.69 KB  144.00 KB   
01:AGGREGATE   3  3  179.794us  196.761us  3   
2   2.03 MB   10.00 MB  STREAMING
00:SCAN HDFS   3  3   40.368ms   42.799ms  3   
2  32.00 KB   32.00 MB  default.foo

Same for SingleNodePlanner. Same results with

> select distinct c, d from foo limit 2



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> isMultiPhase encompasses all nodes that are part of a chain of aggregation
I see.

Can you perform a simple performance test to see if this would negatively 
affect queries that a very small subset of non-merge aggregate nodes can 
provide the answer?

For example, let us partition table T on column a, b into 10 partitions and 
sorted on a, b. The query is
select distinct a, b from T limit 2.

Normally, such query can finish as soon as two smallest subsets of rows (on a, 
b) are read in.

By reading the code here, my understand is that with the change we can not 
complete early until on all read nodes (from 10 partitions) are done the work 
and we can complete early only at the very top merge node is active. True?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:46:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#7).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.
- passed an exhaustive test run.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 47 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/7
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9631/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:33:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13814/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 19:10:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13813/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 19:08:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#6).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 47 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/6
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@83
PS4, Line 83: finalize
> nit node?
Done


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@172
PS4, Line 172: Preconditions.checkState(needsFinalize_ && !endsMultiPhase_);
> && !endsMultiPhase_
Done


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> I wonder if this effectively requires all non-root aggregation nodes (i.e.,
isMultiPhase encompasses all nodes that are part of a chain of aggregation 
nodes. Most of them useIntermediateTuple_. So this change should not introduce 
any behavioral changes in DistributedPlanner, and only affects 
SingleNodePlanner for simple cases (no aggregate expressions or conjuncts).

Is there a particular case it would be useful to test?

It might be possible to fix this by counting. It would require adding 
additional logic to double-check and possibly truncate results coming from 
aggregating individual partitions. Fixing one case that was missed in the 
planning stage seemed more straight-forward.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:42:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#5).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores definition of ALL_CLUSTER_SIZES and makes it simpler to
  enable for individual test suites. Filed IMPALA-12394 to generally
  re-enable testing with ALL_CLUSTER_SIZES. Enables ALL_CLUSTER_SIZES
  for aggregation tests.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
6 files changed, 46 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/5
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py@168
PS5, Line 168:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/impala_test_suite.py@168
PS5, Line 168:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/20379/5/tests/common/test_dimensions.py@215
PS5, Line 215: def create_exec_option_dimension(cluster_sizes=ALL_NODES_ONLY,
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:43:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

(4 comments)

The change looks great.

Just have one question on performance.

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: endsMultiPhase_ = f
> Looked into this a bit more. Most of those paths don't use pushdown because
Done


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@83
PS4, Line 83: finalize
nit node?


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@172
PS4, Line 172: Preconditions.checkState(needsFinalize_);
&& !endsMultiPhase_


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
I wonder if this effectively requires all non-root aggregation nodes (i.e., 
their endsMultiPhase_ is false) to complete their work first, prior to the 
merge aggregate node. If so, this may cause performance degradation.

Is it possible to fix this bug in backend by counting number of values x coming 
up at the merge aggregate and bailing out of the loop when x >= n?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:11:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/20379/4/tests/common/test_dimensions.py@a199
PS4, Line 199:
This has been here so long that I might need to re-enable it gradually. 
Currently 214 test failures in an exhaustive run to look through.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:03:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20379/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/2/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777
PS2, Line 777: return useIntermediateTuple_ || endsMultiPhase_;
>From my inspection, useStreamingPreagg_ is a subset of useIntermediateTuple_. 
>So removing it should be fine.


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1131
PS4, Line 1131: agg.setEndsMultiPhase();
This might be causing a bunch of num_nodes=1 test failures. Looking into it.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 17:40:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13794/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 21:34:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13793/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 21:33:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

(2 comments)

Used 
https://gerrit.cloudera.org/c/17821/18/testdata/workloads/targeted-perf/queries/aggregation.test
 to verify that FastLimitCheck still happens for DistributedPlanner.

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: endsMultiPhase_ = f
> It's not. useStreamingPreagg is only set on the pre-aggregation step.
Looked into this a bit more. Most of those paths don't use pushdown because 
they have conjuncts and/or aggregate expressions. Mainly seems to affect simple 
select distinct/group by queries. I added one other case to the test, but it 
wasn't failing before.


http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py@405
PS3, Line 405: (
> flake8: E131 continuation line unaligned for hanging indent
Done



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 21:14:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#4).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores running with num_nodes=0 and num_nodes=1 for default test
  dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter
  tests with num_nodes=1 due to finstance expectations.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_runtime_filters.py
6 files changed, 39 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/4
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#3).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores running with num_nodes=0 and num_nodes=1 for default test
  dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter
  tests with num_nodes=1 due to finstance expectations.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_runtime_filters.py
6 files changed, 39 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/3
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/20379/3/tests/query_test/test_aggregation.py@405
PS3, Line 405: "
flake8: E131 continuation line unaligned for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 21:08:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13792/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 19:30:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@12
PS1, Line 12: i
> would incorrectly return
Done


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@11
PS1, Line 11: As a
: result,
> nit. As a result,
Done


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@17
PS1, Line 17: This fix i
> nit. This fix identifies
Done



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 19:05:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20379

to look at the new patch set (#2).

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. As a
result, the following query would incorrectly return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

This fix identifies all aggregation nodes that use pre-aggregation so we
use fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores running with num_nodes=0 and num_nodes=1 for default test
  dimensions; IMPALA-561 was fixed ages ago. Disables runtime filter
  tests with num_nodes=1 due to finstance expectations.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_runtime_filters.py
5 files changed, 30 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/2
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-21 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: hasStreamingPreagg_
> I wonder if this new variable is logically equivalent to the following
It's not. useStreamingPreagg is only set on the pre-aggregation step.

All these flags are difficult to reason about though. I'm going to spend some 
time with them; there are 3 places we create aggregation nodes:
- SingleNodePlanner#createAggregationPlan, which can add distinct and transpose 
phases
- DistributedPlanner#createMergeAggregationFragment
- DistributedPlanner#createPhase2DistinctAggregationFragment

I think I need to understand how to generate all these and make sure we have 
test coverage. Maybe I can come up with a clearer way to describe the different 
use cases.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 21 Aug 2023 17:32:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 1:

(4 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@11
PS1, Line 11: For
: example
nit. As a result,


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@12
PS1, Line 12:
would incorrectly return


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@17
PS1, Line 17: Identifies
nit. This fix identifies


http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: hasStreamingPreagg_
I wonder if this new variable is logically equivalent to the following

needsFinalize_ && useStreamingPreagg_



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 Aug 2023 16:23:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13780/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Aug 2023 19:10:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-18 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20379


Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

IMPALA-12383: Fix SingleNodePlanner aggregation limits

When IMPALA-2581 was implemented, it assumed all aggregation nodes would
have a pre-aggregation step that limits could be pushed to. That's not
the case when using SingleNodePlanner, such as when num_nodes=1. For
example, the following query would return 16 rows, not 10:

  set num_nodes=1;
  select distinct l_orderkey from tpch.lineitem limit 10;

Identifies all aggregation nodes that use pre-aggregation so we use
fast_limit_check in only those cases.

Testing:
- added a test case where we assert number of rows returned by an
  aggregation node (rather than an exchange or top-n).
- restores running with num_nodes=0 and num_nodes=1 for default test
  dimensions; IMPALA-561 was fixed ages ago.

Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M tests/common/test_dimensions.py
M tests/query_test/test_aggregation.py
4 files changed, 27 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/20379/1
--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith