[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

Ben-Zvi Fri, 08 Sep 2017 19:17:35 -0700

GitHub user Ben-Zvi opened a pull request:

    https://github.com/apache/drill/pull/938


    DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement

      The main change in this PR is adding a "_second way_" to handle memory 
pressure for the Hash Aggregate: Basically catch OOM failures when processing a 
new input row (during put() into the Hash Table), cleanup internally to allow a 
retry (of the put()) and return a new exception "**RetryAfterSpillException**". 
In such a case the caller spills some partition to free more memory, and 
retries inserting that new row.
       In addition, to reduce the risk of OOM when either creating the "Values 
Batch" (to match the "Keys Batch" in the Hash Table), or when allocating the 
Outgoing vectors (for the Values) -- there are new "_reserves_" -- one reserve 
for each of the two. A "_reserve_" is a memory amount subtracted from the 
memory-limit, which is added back to the limit just before it is needed, so 
hopefully preventing an OOM. After the allocation the code tries to restore 
that reserve (by subtracting from the limit, if possible). We always restore 
the "Outgoing Reserve" first; in case the "Values Batch" reserve runs empty 
just before calling put(), we skip the put() (just like an OOM there) and spill 
to free some memory (and restore that reserve).
       The old "_first way_" is still used. That is the code that predicts the 
memory needs, and triggers a spill if not enough memory is available. The spill 
code was separated into a new method called spillIfNeeded() which is used in 
two modes - either the old way (prediction), or (when called from the new OOM 
catch code) with a flag to force a spill, regardless of available memory. That 
flag is also used to reduce the priority of the "current partition" when 
choosing a partition to spill.
    
      A new testing option was added (**hashagg_use_memory_prediction**, 
default true) - by setting this to false the old "first way" is disabled. This 
allows stress testing of the OOM handling code (which may not be used under 
normal memory allocation).
    
      The HashTable put() code was re-written to cleanup partial changes in 
case of an OOM. And so the code around the call of put() to catch the new 
exception, spill and retry. Note that this works for 1st phase aggregation as 
well (return rows early).
    
    For the estimates (in addition to the old "max batch size" estimate) - 
there is an estimate for the Values Batch, and one for for the Outgoing. These 
are used for restoring the "reserves". These estimates may be resized up in 
case actual allocations are bigger.
    
    Other changes:
    * Improved the "max batch size estimation" -- using the outgoing batch for 
getting the correct schema (instead of the input batch).
      The only information needed from the input batch is the "max average 
column size" (see change inRecordBatchSizer.java) to have a better estimate for 
VarChars.
      Also computed the size of those "no null" bigint columns added into the 
Values Batch when the aggregation is SUM, MIN or MAX (see changes in 
HashAggBatch.java and HashAggregator.java)
    * Using a "plain Java" subclass for the HashTable  because "byte 
manipulation" breaks on the new template code (see ChainedHashTable.java)
    * The three Configuration options where changed into System/Session 
options:   min_batches_per_partition , hashagg_max_memory , 
hashagg_num_partitions
    * There was a potential memory leak in the HashTable BatchHolder ctor 
(vectors were added to the container only after the successful allocation, and 
the container was cleared in case of OOM. So in case of a partial allocation, 
the allocated part was no accessible). Also (Paul's suggestion) modified some 
vector templates to cleanup after any runtime error (including an OOM).
    * Performance improvements: Eliminated the call to updateBatches() before 
each hash computation (instead used only when switching to a new 
SpilledRecordBatch); this was a big overhead.
       Also changed all the "setSafe" calls into "set" for the HashTable (those 
nanoseconds add up, specially when rehashing) - these bigint vectors need no 
resizing.
    * Ignore "(spill) file not found" error while cleaning up.
    * The unit tests were re-written in a more compact form. And a test with 
the new option (forcing the OOM code) was added (no memory prediction).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ben-Zvi/drill DRILL-5694

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #938
    
----
commit 1a96bb39faf01b7665bd669d88494789693ed9b8
Author: Ben-Zvi <[email protected]>
Date:   2017-09-08T22:52:57Z

    DRILL-5694: Handle OOM in HashAggr by spill and retry, plus performance 
improvement

----


---

[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

Reply via email to