GitHub user ppadma opened a pull request:
https://github.com/apache/drill/pull/1125
DRILL-6126: Allocate memory for value vectors upfront in flatten operator
Made changes to allocate memory upfront for flatten operator based on
sizing calculations.
Need to do allocation of single column (can be nested) for a particular
record count
and allocation hints. Refactored the code a bit for that.
Instead of assuming worst case fragmentation factor of 2, changed the logic
to round down the number of rows calculated to nearest power of two. This will
allow us to pack value vectors more densely and will help with memory
utilization.
RepeatedMapvector and RepeatedListVector are extended from
RepeatedFixedWidthVectorLike. This is wrong and causing problems in Allocation
logic (allocatePrecomputedChildCount in AllocationHelper more specifically).
Fixed that.
This PR has 2 commits. One for all of the above and second one for
DRILL-6162: Enhance record batch sizer to retain nesting information.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ppadma/drill DRILL-6126
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1125.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1125
----
commit 58c6b9ad584e56c71d982feaaa43ad32b5011eef
Author: Padma Penumarthy <ppenumar97@...>
Date: 2018-02-21T17:33:12Z
DRILL-6162: Enhance record batch sizer to retain nesting information for
map columns.
commit f7c09131179b75d10ffe195785c9aef3b9c7ed97
Author: Padma Penumarthy <ppenumar97@...>
Date: 2018-02-21T17:35:47Z
DRILL-6126: Allocate memory for value vectors upfront in flatten operator
----
---