GitHub user Ben-Zvi opened a pull request:
https://github.com/apache/drill/pull/938
DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement
The main change in this PR is adding a "_second way_" to handle memory
pressure for the Hash Aggregate: Basically catch OOM failures when processing a
new input row (during put() into the Hash Table), cleanup internally to allow a
retry (of the put()) and return a new exception "**RetryAfterSpillException**".
In such a case the caller spills some partition to free more memory, and
retries inserting that new row.
In addition, to reduce the risk of OOM when either creating the "Values
Batch" (to match the "Keys Batch" in the Hash Table), or when allocating the
Outgoing vectors (for the Values) -- there are new "_reserves_" -- one reserve
for each of the two. A "_reserve_" is a memory amount subtracted from the
memory-limit, which is added back to the limit just before it is needed, so
hopefully preventing an OOM. After the allocation the code tries to restore
that reserve (by subtracting from the limit, if possible). We always restore
the "Outgoing Reserve" first; in case the "Values Batch" reserve runs empty
just before calling put(), we skip the put() (just like an OOM there) and spill
to free some memory (and restore that reserve).
The old "_first way_" is still used. That is the code that predicts the
memory needs, and triggers a spill if not enough memory is available. The spill
code was separated into a new method called spillIfNeeded() which is used in
two modes - either the old way (prediction), or (when called from the new OOM
catch code) with a flag to force a spill, regardless of available memory. That
flag is also used to reduce the priority of the "current partition" when
choosing a partition to spill.
A new testing option was added (**hashagg_use_memory_prediction**,
default true) - by setting this to false the old "first way" is disabled. This
allows stress testing of the OOM handling code (which may not be used under
normal memory allocation).
The HashTable put() code was re-written to cleanup partial changes in
case of an OOM. And so the code around the call of put() to catch the new
exception, spill and retry. Note that this works for 1st phase aggregation as
well (return rows early).
For the estimates (in addition to the old "max batch size" estimate) -
there is an estimate for the Values Batch, and one for for the Outgoing. These
are used for restoring the "reserves". These estimates may be resized up in
case actual allocations are bigger.
Other changes:
* Improved the "max batch size estimation" -- using the outgoing batch for
getting the correct schema (instead of the input batch).
The only information needed from the input batch is the "max average
column size" (see change inRecordBatchSizer.java) to have a better estimate for
VarChars.
Also computed the size of those "no null" bigint columns added into the
Values Batch when the aggregation is SUM, MIN or MAX (see changes in
HashAggBatch.java and HashAggregator.java)
* Using a "plain Java" subclass for the HashTable because "byte
manipulation" breaks on the new template code (see ChainedHashTable.java)
* The three Configuration options where changed into System/Session
options: min_batches_per_partition , hashagg_max_memory ,
hashagg_num_partitions
* There was a potential memory leak in the HashTable BatchHolder ctor
(vectors were added to the container only after the successful allocation, and
the container was cleared in case of OOM. So in case of a partial allocation,
the allocated part was no accessible). Also (Paul's suggestion) modified some
vector templates to cleanup after any runtime error (including an OOM).
* Performance improvements: Eliminated the call to updateBatches() before
each hash computation (instead used only when switching to a new
SpilledRecordBatch); this was a big overhead.
Also changed all the "setSafe" calls into "set" for the HashTable (those
nanoseconds add up, specially when rehashing) - these bigint vectors need no
resizing.
* Ignore "(spill) file not found" error while cleaning up.
* The unit tests were re-written in a more compact form. And a test with
the new option (forcing the OOM code) was added (no memory prediction).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Ben-Zvi/drill DRILL-5694
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/938.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #938
----
commit 1a96bb39faf01b7665bd669d88494789693ed9b8
Author: Ben-Zvi <[email protected]>
Date: 2017-09-08T22:52:57Z
DRILL-5694: Handle OOM in HashAggr by spill and retry, plus performance
improvement
----
---