[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Reviewed-on: http://gerrit.cloudera.org:8080/6963 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 984 insertions(+), 39 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/794/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Code-Review+2 carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 12: Removed the preconditions check - it was too strict for the case when the hash join was in a subplan. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Hello Impala Public Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6963 to look at the new patch set (#12). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 984 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/12 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 11: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 11: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/ -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Alex Behm has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 10: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: // default buffer size, e.g. with small dimension tables. > Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs Exactly, thanks. I think there might be some non-trivial plumbing involved, so better to do later. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 8: PS 10 is a rebase and commit message update -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#10). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 986 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/10 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Increase explain level for TPC-H and TPC-DS to provide visibility into the memory estimates on those data sets. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 986 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/9 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: // default buffer size, e.g. with small dimension tables. > That's correct. Ultimately the planner can only guess because we do the sch Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs to be fixed so it might be helpful to add any ideas you had in there. http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: if (getChild(1).getCardinality() == -1 || getChild(1).getAvgRowSize() == -1 > Agree Done. Made it check for <= 0 since there are a couple of bugs in the numNodes calculation that I fixed in https://gerrit.cloudera.org/#/c/7223/ where it could return 0. Line 190: } else { > Add a Preconditions check that asserts MT_DOP=0 with a comment that the est We still need to generate plans in planner tests, so I made the DCHECK conditional on whether it's a test environment. http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 629: protected final static long getDefaultSpillableBufferBytes() { > Why this indirection? To make the intent clear at the callsites. The HDFS read size and the spillable buffer size will be decoupled in IMPALA-3200. It will be easier to fix that up if we make the distinction in the planner already. http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: Line 2: select * > use straight_join for these tests Done Line 761: select string_col, count(*) > Add a similar test with no stats for the join case (or integrate into this I have a test above " Join with no stats for right input - should use default buffers." Or did you have an additional test in mind? -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Alex Behm has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: // default buffer size, e.g. with small dimension tables. > The effective MT_DOP would depend somewhat on how the scan ranges were divi That's correct. Ultimately the planner can only guess because we do the scheduling in the BE. I think we should be conservative and not underestimate, but at the same time the number of fragment instances can definitely not be higher than the total number of scan ranges. In your example, I don't think it's safe to assume that the 4 ranges would be spread across 4 hosts (but that's probably what the planner will think in terms of #hosts). Also I don't think the number of hosts and scan ranges will vary dramatically in practice, because the planner estimates the number of hosts based on the scan-range replicas. So with a replication factor of R and N scan ranges you cannot have more than R*N hosts. Ultimately the number of hosts is not very reliable either. I think addressing this comment may be a little involved, so how about filing a JIRA and deferring for now? http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: if (getChild(1).getCardinality() == -1 || getChild(1).getAvgRowSize() == -1 > I'm not sure what this was about - I just kept this from the existing code. Agree Line 190: } else { > This might need some more thought about how this should work with parallel Add a Preconditions check that asserts MT_DOP=0 with a comment that the estimation is not quite right with parallel plans? http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 629: protected final static long getDefaultSpillableBufferBytes() { Why this indirection? http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: Line 2: select * use straight_join for these tests Line 761: select string_col, count(*) Add a similar test with no stats for the join case (or integrate into this one) -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 8: (1 comment) Any thoughts? http://gerrit.cloudera.org:8080/#/c/6963/8//COMMIT_MSG Commit Message: Line 21: Increase explain level for TPC-H and TPC-DS to provide visibility into Need to remove -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 8: Cleaned up this patch, removing the increased explain level of the existing planner tests. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#8). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Increase explain level for TPC-H and TPC-DS to provide visibility into the memory estimates on those data sets. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 10 files changed, 983 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/8 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: bufferSize = Math.min(bufferSize, Math.max(minSpillableBufferBytes_, > Estimates are still conservative but more accurate than before - great! The The effective MT_DOP would depend somewhat on how the scan ranges were divided between hosts too, right? E.g. if we had 4 scan ranges across 100 hosts, and mt_dop=4, could the planner assume that the scan ranges end up on different hosts? http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: || numNodes_ == 0) { > What about numNodes_ == -1? I'm not sure what this was about - I just kept this from the existing code. We don't actually use num_nodes_ in the calculation so maybe it's defunct and we should instead use num_instances_. Line 190: perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_ > For broadcast, it seems we should count the full HT size only once per node This might need some more thought about how this should work with parallel plans and separate join builds. It seems like once that's separated out more, we would compute the resource requirement for the single instance of the join build sink, rather than divide the requirement between the different fragment instances. Line 193: perInstanceDataBytes /= fragment_.getNumInstances(queryOptions.getMt_dop()); > getNumInstances() could return -1 Done http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java File fe/src/main/java/org/apache/impala/util/BitUtil.java: Line 1: package org.apache.impala.util; > Apache header Done http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 409: // vary the spillable buffer size.. > trailing "." Done Line 415: PlanNode.minSpillableBufferBytes_ = 64 * 1024; > Would it make sense to move this into RuntimeEnv. Setting a static variable I didn't realise there was already a thing that existed for this purpose. Moved it there. Also changed it so that the default buffer size was derived from --read_size rather than hardcoded. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Increase explain level for TPC-H and TPC-DS to provide visibility into the memory estimates on those data sets. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test 12 files changed, 10,996 insertions(+), 875 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/7 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Dan Hecht has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: > I think it might be best, if possible, to agree on the planner > mechanism first and then tune the policy based on experiments with > a more final version of query execution. I could do experiments > with the current prototype of query execution but it feels a bit > speculative until more pieces are in place. Yeah, I meant against the new code, not the old. just wanted to think through (and verify) the implications, but it doesn't have to block checking it in if we all agree this is a good strategy. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Alex Behm has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: bufferSize = Math.min(bufferSize, Math.max(minSpillableBufferBytes_, Estimates are still conservative but more accurate than before - great! The one place where I think we are still too aggressive is PlanFragment.getNumInstances(). We "blindly" assume that we can get the maximum MT_DOP, but I think it will be common not to. The effective MT_DOP is limited by the number of scan ranges in the left-most scan, so ideally we'd take that into account in PlanFragment.getNumInstances(). I don't feel strongly on doing this now or later, but it's something to be aware of. http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: || numNodes_ == 0) { What about numNodes_ == -1? Line 190: perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_ For broadcast, it seems we should count the full HT size only once per node and not per instance. Line 193: perInstanceDataBytes /= fragment_.getNumInstances(queryOptions.getMt_dop()); getNumInstances() could return -1 http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java File fe/src/main/java/org/apache/impala/util/BitUtil.java: Line 1: package org.apache.impala.util; Apache header http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 409: // vary the spillable buffer size.. trailing "." Line 415: PlanNode.minSpillableBufferBytes_ = 64 * 1024; Would it make sense to move this into RuntimeEnv. Setting a static variable in PlanNode seems a little weird. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: The main scenario where it could lead to additional spilling (assuming mem_limit is fixed) is if the scans grabbed memory that the spillable operators would otherwise have been able to use. Otherwise it could result in different operators spilling, which could have positive or negative impacts on perf. If we get it right we'll reduce spilling because more memory can be devoted to the operators that need it. I ran a few TPC-H queries to compare estimated memory with peak memory. E.g. TPC-H Q9 is below. The estimates are off by quite a bit in some cases, but the error comes mainly from the non-linear memory consumption of the 64kb/512kb/8mb buffer size ramp-up in the old code, rather than cardinality estimation errors. Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 21:MERGING-EXCHANGE1 149.369us 149.369us 175 61.70K 0 0 UNPARTITIONED 12:SORT3 410.974us 547.810us 175 61.70K 24.02 MB 16.00 MB 20:AGGREGATE 31.261ms1.306ms 175 61.70K 2.30 MB 10.00 MB FINALIZE 19:EXCHANGE3 68.482us 81.851us 525 61.70K 0 0 HASH(nation,o_year) 11:AGGREGATE 3 201.760ms 217.919ms 525 61.70K 2.11 MB 10.00 MB STREAMING 10:HASH JOIN 3 26.906ms 37.907ms 319.40K 574.29K 1.06 MB 690.00 B INNER JOIN, BROADCAST |--18:EXCHANGE 3 17.306us 17.720us 25 25 0 0 BROADCAST | 05:SCAN HDFS1 17.458ms 17.458ms 25 25 43.00 KB 32.00 MB tpch.nation 09:HASH JOIN 3 114.307ms 155.488ms 319.40K 574.29K 169.40 MB 20.14 MB INNER JOIN, BROADCAST |--17:EXCHANGE 3 284.583ms 296.048ms 800.00K 800.00K 0 0 BROADCAST | 03:SCAN HDFS1 373.590ms 373.590ms 800.00K 800.00K 32.46 MB 176.00 MB tpch.partsupp 08:HASH JOIN 3 39.011ms 43.315ms 319.40K 574.29K 1.52 MB 107.42 KB INNER JOIN, BROADCAST |--16:EXCHANGE 32.050ms2.485ms 10.00K 10.00K 0 0 BROADCAST | 01:SCAN HDFS1 775.848ms 775.848ms 10.00K 10.00K 2.04 MB 32.00 MB tpch.supplier 07:HASH JOIN 3 722.690ms 736.289ms 319.40K 574.29K 153.17 MB 17.83 MB INNER JOIN, PARTITIONED |--15:EXCHANGE 3 187.995ms 194.122ms1.50M 1.50M 0 0 HASH(o_orderkey) | 04:SCAN HDFS2 318.553ms 512.326ms1.50M 1.50M 42.05 MB 176.00 MB tpch.orders 14:EXCHANGE3 41.321ms 62.755ms 319.40K 598.58K 0 0 HASH(l_orderkey) 06:HASH JOIN 3 256.056ms 273.355ms 319.40K 598.58K 1.47 MB1.19 MB INNER JOIN, BROADCAST |--13:EXCHANGE 31.733ms1.972ms 10.66K 20.00K 0 0 BROADCAST | 00:SCAN HDFS1 821.018ms 821.018ms 10.66K 20.00K 32.04 MB 64.00 MB tpch.part 02:SCAN HDFS 32s611ms2s690ms6.00M 6.00M 64.23 MB 264.00 MB tpch.lineitem It sounds like we would need to test this with the new code - using the old code won't give any insights because it will be dominated by behaviour that doesn't carry over. Interestingly we might actually win back a fair bit of memory and therefore reduce spilling by avoiding the big jump in memory consumption of 8MB buffers. I think it might be best, if possible, to agree on the planner mechanism first and then tune the policy based on experiments with a more final version of query execution. I could do experiments with the current prototype of query execution but it feels a bit speculative until more pieces are in place. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong G
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Dan Hecht has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: > I updated the patchset to use explain_level EXTENDED for TPC-H and > TPC-DS. This doesn't enable the spill buffer size optimisation for > those test, but you can inspect 'mem-estimate' to get a rough idea > of what the reservation would be with the new code. > > It looks like for the scale factor 1 TPC-H and TPC-DS most joins > would have their reservation scaled down, since mem-estimate <<< 17 > * 2MB buffers. But then the question is does the scaled down reservation impact perf or cause more spilling? -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 6: I updated the patchset to use explain_level EXTENDED for TPC-H and TPC-DS. This doesn't enable the spill buffer size optimisation for those test, but you can inspect 'mem-estimate' to get a rough idea of what the reservation would be with the new code. It looks like for the scale factor 1 TPC-H and TPC-DS most joins would have their reservation scaled down, since mem-estimate <<< 17 * 2MB buffers. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Increase explain level for TPC-H and TPC-DS to provide visibility into the memory estimates on those data sets. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test 11 files changed, 10,963 insertions(+), 869 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/6 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 5: (1 comment) Yes I agree it would be interesting to look at. One option is to switch more of the planner tests so that they actually show resource estimates - the accuracy of the memory estimates tracks this closely. http://gerrit.cloudera.org:8080/#/c/6963/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: PS4, Line 77: _ > missing underscore. Done -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 8 files changed, 919 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/5 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Dan Hecht has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 4: (1 comment) Would be interesting to see how this works on some real datasets. http://gerrit.cloudera.org:8080/#/c/6963/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: PS4, Line 77: missing underscore. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 4: Posting for discussion. The approach used is quite simple. The main risk of this approach is if the cardinality estimate is too low, we may reserve less memory for the join/agg than needed. If there is memory pressure overall in the query, this could potentially force a join/agg to spill and do small I/Os, which could be slow. A minimum spillable buffer size query option will provide a safety valve if the planner sets the buffer size too low. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 8 files changed, 919 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/4 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong