Hello Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16240
to look at the new patch set (#5).
Change subject: WIP IMPALA-9955,IMPALA-9957: Fix not enough reservation for
large read/write pages in GroupingAggregator
..
WIP IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write
pages in GroupingAggregator
The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.
To be specific, we save extra reservation for writing a large page. It's
only used when we run out of unused reservation and fail to increase the
reservation to fit the large page. Currently there are two cases in
non-streaming GroupingAggregator. One case is when we start to spill a
partition and a serialize stream is needed to write some large pages.
The other case is when we have spilled all partitions in a repartition
process and need to write a large page to a spilled partition. Note that
each spilled partition in the repartition process still keeps the
default_page_size worth of reservation for writing a default page. We
can only restore the extra reservation when a partition is actually
writing a large page, and then reclaim it after the writing.
The same for extra reservation for reading a large page. In the
repartition process, we may read large pages from the input stream (from
a previous spilled partition). When it needs to pin the current large
page, we restore the extra reservation, and then reclaim it when the
attached row batch is reset.
This patch also fixes the wrong assumption that non-streaming
GroupingAggregator only requires one buffer reservation for the hash
tables. The minimal spillable buffer size is 64KB, while the minimal
requirement of a non-streaming GroupingAggregator's hash tables is
num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB.
We should reserve more buffers when the spillable buffer size is small.
Fix some planner test failures due to this change.
Tests:
- Add tests in test_spilling.py to verify GroupingAggregator works in
min reservation.
Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
17 files changed, 486 insertions(+), 176 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/5
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong