[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Reviewed-on: http://gerrit.cloudera.org:8080/6963
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 984 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/794/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12: Code-Review+2

carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12:

Removed the preconditions check - it was too strict for the case when the hash 
join was in a subplan.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6963

to look at the new patch set (#12).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 984 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 11: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 11: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: // default buffer size, e.g. with small dimension tables.
> Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs
Exactly, thanks. I think there might be some non-trivial plumbing involved, so 
better to do later.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

PS 10 is a rebase and commit message update

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 986 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Increase explain level for TPC-H and TPC-DS to provide visibility into
the memory estimates on those data sets.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 986 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: // default buffer size, e.g. with small dimension tables.
> That's correct. Ultimately the planner can only guess because we do the sch
Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs to 
be fixed so it might be helpful to add any ideas you had in there.


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186: if (getChild(1).getCardinality() == -1 || 
getChild(1).getAvgRowSize() == -1
> Agree
Done. Made it check for <= 0 since there are a couple of bugs in the numNodes 
calculation that I fixed in https://gerrit.cloudera.org/#/c/7223/ where it 
could return 0.


Line 190: } else {
> Add a Preconditions check that asserts MT_DOP=0 with a comment that the est
We still need to generate plans in planner tests, so I made the DCHECK 
conditional on whether it's a test environment.


http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 629:   protected final static long getDefaultSpillableBufferBytes() {
> Why this indirection?
To make the intent clear at the callsites. The HDFS read size and the spillable 
buffer size will be decoupled in IMPALA-3200. It will be easier to fix that up 
if we make the distinction in the planner already.


http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test:

Line 2: select *
> use straight_join for these tests
Done


Line 761: select string_col, count(*)
> Add a similar test with no stats for the join case (or integrate into this 
I have a test above " Join with no stats for right input - should use default 
buffers." Or did you have an additional test in mind?


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: // default buffer size, e.g. with small dimension tables.
> The effective MT_DOP would depend somewhat on how the scan ranges were divi
That's correct. Ultimately the planner can only guess because we do the 
scheduling in the BE. I think we should be conservative and not underestimate, 
but at the same time the number of fragment instances can definitely not be 
higher than the total number of scan ranges.

In your example, I don't think it's safe to assume that the 4 ranges would be 
spread across 4 hosts (but that's probably what the planner will think in terms 
of #hosts). Also I don't think the number of hosts and scan ranges will vary 
dramatically in practice, because the planner estimates the number of hosts 
based on the scan-range replicas. So with a replication factor of R and N scan 
ranges you cannot have more than R*N hosts. Ultimately the number of hosts is 
not very reliable either.

I think addressing this comment may be a little involved, so how about filing a 
JIRA and deferring for now?


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186: if (getChild(1).getCardinality() == -1 || 
getChild(1).getAvgRowSize() == -1
> I'm not sure what this was about - I just kept this from the existing code.
Agree


Line 190: } else {
> This might need some more thought about how this should work with parallel 
Add a Preconditions check that asserts MT_DOP=0 with a comment that the 
estimation is not quite right with parallel plans?


http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 629:   protected final static long getDefaultSpillableBufferBytes() {
Why this indirection?


http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test:

Line 2: select *
use straight_join for these tests


Line 761: select string_col, count(*)
Add a similar test with no stats for the join case (or integrate into this one)


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

(1 comment)

Any thoughts?

http://gerrit.cloudera.org:8080/#/c/6963/8//COMMIT_MSG
Commit Message:

Line 21: Increase explain level for TPC-H and TPC-DS to provide visibility into
Need to remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

Cleaned up this patch, removing the increased explain level of the existing 
planner tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Increase explain level for TPC-H and TPC-DS to provide visibility into
the memory estimates on those data sets.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 983 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: bufferSize = Math.min(bufferSize, 
Math.max(minSpillableBufferBytes_,
> Estimates are still conservative but more accurate than before - great! The
The effective MT_DOP would depend somewhat on how the scan ranges were divided 
between hosts too, right? E.g. if we had 4 scan ranges across 100 hosts, and 
mt_dop=4, could the planner assume that the scan ranges end up on different 
hosts?


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186: || numNodes_ == 0) {
> What about numNodes_ == -1?
I'm not sure what this was about - I just kept this from the existing code. We 
don't actually use num_nodes_ in the calculation so maybe it's defunct and we 
should instead use num_instances_.


Line 190:   perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_
> For broadcast, it seems we should count the full HT size only once per node
This might need some more thought about how this should work with parallel 
plans and separate join builds. It seems like once that's separated out more, 
we would compute the resource requirement for the single instance of the join 
build sink, rather than divide the requirement between the different fragment 
instances.


Line 193: perInstanceDataBytes /= 
fragment_.getNumInstances(queryOptions.getMt_dop());
> getNumInstances() could return -1
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java
File fe/src/main/java/org/apache/impala/util/BitUtil.java:

Line 1: package org.apache.impala.util;
> Apache header
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 409: // vary the spillable buffer size..
> trailing "."
Done


Line 415: PlanNode.minSpillableBufferBytes_ = 64 * 1024;
> Would it make sense to move this into RuntimeEnv. Setting a static variable
I didn't realise there was already a thing that existed for this purpose. Moved 
it there. Also changed it so that the default buffer size was derived from 
--read_size rather than hardcoded.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Increase explain level for TPC-H and TPC-DS to provide visibility into
the memory estimates on those data sets.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
12 files changed, 10,996 insertions(+), 875 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

> I think it might be best, if possible, to agree on the planner
 > mechanism first and then tune the policy based on experiments with
 > a more final version of query execution. I could do experiments
 > with the current prototype of query execution but it feels a bit
 > speculative until more pieces are in place.

Yeah, I meant against the new code, not the old. just wanted to think through 
(and verify) the implications, but it doesn't have to block checking it in if 
we all agree this is a good strategy.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: bufferSize = Math.min(bufferSize, 
Math.max(minSpillableBufferBytes_,
Estimates are still conservative but more accurate than before - great! The one 
place where I think we are still too aggressive is 
PlanFragment.getNumInstances(). We "blindly" assume that we can get the maximum 
MT_DOP, but I think it will be common not to. The effective MT_DOP is limited 
by the number of scan ranges in the left-most scan, so ideally we'd take that 
into account in PlanFragment.getNumInstances(). I don't feel strongly on doing 
this now or later, but it's something to be aware of.


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186: || numNodes_ == 0) {
What about numNodes_ == -1?


Line 190:   perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_
For broadcast, it seems we should count the full HT size only once per node and 
not per instance.


Line 193: perInstanceDataBytes /= 
fragment_.getNumInstances(queryOptions.getMt_dop());
getNumInstances() could return -1


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java
File fe/src/main/java/org/apache/impala/util/BitUtil.java:

Line 1: package org.apache.impala.util;
Apache header


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 409: // vary the spillable buffer size..
trailing "."


Line 415: PlanNode.minSpillableBufferBytes_ = 64 * 1024;
Would it make sense to move this into RuntimeEnv. Setting a static variable in 
PlanNode seems a little weird.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

The main scenario where it could lead to additional spilling (assuming 
mem_limit is fixed) is if the scans grabbed memory that the spillable operators 
would otherwise have been able to use. Otherwise it could result in different 
operators spilling, which could have positive or negative impacts on perf. If 
we get it right we'll reduce spilling because more memory can be devoted to the 
operators that need it.

I ran a few TPC-H queries to compare estimated memory with peak memory. E.g. 
TPC-H Q9 is below. The estimates are off by quite a bit in some cases, but the 
error comes mainly from the non-linear memory consumption of the 64kb/512kb/8mb 
buffer size ramp-up in the old code, rather than cardinality estimation errors.

Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   
Peak Mem  Est. Peak Mem  Detail  

---
21:MERGING-EXCHANGE1  149.369us  149.369us  175  61.70K 
 0  0  UNPARTITIONED   
12:SORT3  410.974us  547.810us  175  61.70K   
24.02 MB   16.00 MB  
20:AGGREGATE   31.261ms1.306ms  175  61.70K
2.30 MB   10.00 MB  FINALIZE
19:EXCHANGE3   68.482us   81.851us  525  61.70K 
 0  0  HASH(nation,o_year) 
11:AGGREGATE   3  201.760ms  217.919ms  525  61.70K
2.11 MB   10.00 MB  STREAMING   
10:HASH JOIN   3   26.906ms   37.907ms  319.40K 574.29K
1.06 MB   690.00 B  INNER JOIN, BROADCAST   
|--18:EXCHANGE 3   17.306us   17.720us   25  25 
 0  0  BROADCAST   
|  05:SCAN HDFS1   17.458ms   17.458ms   25  25   
43.00 KB   32.00 MB  tpch.nation 
09:HASH JOIN   3  114.307ms  155.488ms  319.40K 574.29K  
169.40 MB   20.14 MB  INNER JOIN, BROADCAST   
|--17:EXCHANGE 3  284.583ms  296.048ms  800.00K 800.00K 
 0  0  BROADCAST   
|  03:SCAN HDFS1  373.590ms  373.590ms  800.00K 800.00K   
32.46 MB  176.00 MB  tpch.partsupp   
08:HASH JOIN   3   39.011ms   43.315ms  319.40K 574.29K
1.52 MB  107.42 KB  INNER JOIN, BROADCAST   
|--16:EXCHANGE 32.050ms2.485ms   10.00K  10.00K 
 0  0  BROADCAST   
|  01:SCAN HDFS1  775.848ms  775.848ms   10.00K  10.00K
2.04 MB   32.00 MB  tpch.supplier   
07:HASH JOIN   3  722.690ms  736.289ms  319.40K 574.29K  
153.17 MB   17.83 MB  INNER JOIN, PARTITIONED 
|--15:EXCHANGE 3  187.995ms  194.122ms1.50M   1.50M 
 0  0  HASH(o_orderkey)
|  04:SCAN HDFS2  318.553ms  512.326ms1.50M   1.50M   
42.05 MB  176.00 MB  tpch.orders 
14:EXCHANGE3   41.321ms   62.755ms  319.40K 598.58K 
 0  0  HASH(l_orderkey)
06:HASH JOIN   3  256.056ms  273.355ms  319.40K 598.58K
1.47 MB1.19 MB  INNER JOIN, BROADCAST   
|--13:EXCHANGE 31.733ms1.972ms   10.66K  20.00K 
 0  0  BROADCAST   
|  00:SCAN HDFS1  821.018ms  821.018ms   10.66K  20.00K   
32.04 MB   64.00 MB  tpch.part   
02:SCAN HDFS   32s611ms2s690ms6.00M   6.00M   
64.23 MB  264.00 MB  tpch.lineitem   


It sounds like we would need to test this with the new code - using the old 
code won't give any insights because it will be dominated by behaviour that 
doesn't carry over. Interestingly we might actually win back a fair bit of 
memory and therefore reduce spilling by avoiding the big jump in memory 
consumption of 8MB buffers.

I think it might be best, if possible, to agree on the planner mechanism first 
and then tune the policy based on experiments with a more final version of 
query execution. I could do experiments with the current prototype of query 
execution but it feels a bit speculative until more pieces are in place.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
G

[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

> I updated the patchset to use explain_level EXTENDED for TPC-H and
 > TPC-DS. This doesn't enable the spill buffer size optimisation for
 > those test, but you can inspect 'mem-estimate' to get a rough idea
 > of what the reservation would be with the new code.
 > 
 > It looks like for the scale factor 1 TPC-H and TPC-DS most joins
 > would have their reservation scaled down, since mem-estimate <<< 17
 > * 2MB buffers.

But then the question is does the scaled down reservation impact perf or cause 
more spilling?

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 6:

I updated the patchset to use explain_level EXTENDED for TPC-H and TPC-DS. This 
doesn't enable the spill buffer size optimisation for those test, but you can 
inspect 'mem-estimate' to get a rough idea of what the reservation would be 
with the new code.

It looks like for the scale factor 1 TPC-H and TPC-DS most joins would have 
their reservation scaled down, since mem-estimate <<< 17 * 2MB buffers.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Increase explain level for TPC-H and TPC-DS to provide visibility into
the memory estimates on those data sets.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
11 files changed, 10,963 insertions(+), 869 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 5:

(1 comment)

Yes I agree it would be interesting to look at.

One option is to switch more of the planner tests so that they actually show 
resource estimates - the accuracy of the memory estimates tracks this closely.

http://gerrit.cloudera.org:8080/#/c/6963/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

PS4, Line 77: _
> missing underscore.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
8 files changed, 919 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-26 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 4:

(1 comment)

Would be interesting to see how this works on some real datasets.

http://gerrit.cloudera.org:8080/#/c/6963/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

PS4, Line 77:  
missing underscore.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 4:

Posting for discussion. The approach used is quite simple.

The main risk of this approach is if the cardinality estimate is too low, we 
may reserve less memory for the join/agg than needed. If there is memory 
pressure overall in the query, this could potentially force a join/agg to spill 
and do small I/Os, which could be slow.

A minimum spillable buffer size query option will provide a safety valve if the 
planner sets the buffer size too low.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
8 files changed, 919 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong