"Batch size control" project update

Paul Rogers Wed, 08 Nov 2017 14:09:07 -0800

Hi All,

The “batch size control” project seeks to further improve Drill’s internal 
memory management by controlling the size of batches created in Drill’s scan 
and “combining” operators. We’ve provided updates in the Hangouts. This note is 
for folks who may have missed those discussions.


Drill currently faces two pesky problems in certain data sets:

* Readers consume a fixed number of records (often 1K, 4K or 64K.) If an 
individual column happens to be “wide”, then Drill may allocate a vector larger 
than 16 MB. Because of how Drill uses the Netty memory allocator, such “jumbo” 
vectors can lead to direct memory fragmentation and premature OOM 
(out-of-memory) errors.

* If no individual column is not wide, the overall row may be, resulting in 
very large record batches, too large, in fact for the new memory-limited sort 
and hash agg operators to consume. (We’ve seen 100 MB batches sent to a sort 
limited to use 30 MB of memory.) This leads to an OOM error because the 
downstream operator exceeds the memory limit set by the planner.

The same problems can arise in “combining” operators such as flatten, project, 
join, etc. These operators take incoming batches and produce new, possibly 
larger, batches. These operations can cause the same memory issues as described 
above for readers.

The goal of the “batch size control” project is to limit individual vectors to 
no more than 16 MB, and to limit overall batch size to some configured maximum. 
In practice, the maximum batch size will eventually be set by the Drill planner 
as part of its memory allocation planning.

The design is inspired by the set of “complex (vector) writers” that already 
exist in Drill. The existing vector writers (and thus the new set as well) are 
modeled after JSON structure builders. They write scalars, tuples and arrays. 
(“Tuple” is a fancy name for what Hive calls a “struct” and Drill calls a 
“map”. Both tuples and arrays occur in JSON-like data sources.)

Unlike the existing set, the new set also monitors memory use and takes the 
required actions. The key challenge in limiting size is that we don’t know 
ahead of time the size of the data we wish to write to a vector, nor can most 
readers “backtrack” and rewrite a record if we discover one of the values 
causes a vector to overflow.

The key contribution of the new vector writers is to monitor vector and batch 
sizes. They employ an algorithm to handle “overflow” when a value is discovered 
to be too large for the remaining space. The vector writers roll the current 
row over to a new batch, allow the reader to continue writing the current 
record, then tell the reader that the batch is full and should be sent 
downstream. When the reader starts with the next batch, the batch will already 
contain the “overflow” row that ended the prior batch.

By using the new mechanism, readers and other operators gain control over their 
batch sizes without the need for each reader to implement its own solution.

There is much more to the solution (such as consolidating projection handling 
in the scan operator.) These topics can be the subject of future updates.

For more information, see:

* DRILL-5211: https://issues.apache.org/jira/browse/DRILL-5211
* Drill PR #914: https://github.com/apache/drill/pull/914

We welcome your comments, questions and suggestions. Please post them in the 
above JIRA ticket, above PR, or to this dev list.

Thanks,

- Paul

"Batch size control" project update

Reply via email to