Ji Liu created ARROW-5259:
-----------------------------
Summary: Add option for ValueVector to allocate buffers with
actual size
Key: ARROW-5259
URL: https://issues.apache.org/jira/browse/ARROW-5259
Project: Apache Arrow
Issue Type: Wish
Reporter: Ji Liu
Assignee: Ji Liu
Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the
buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates
memory for dataBuffer and validityBuffer. However, it always allocate memory
greater than the actual size, because of the invoke ofÂ
_BaseAllocator.nextPowerOfTwo(bufferSize)_.
For example, IntVector will allocate buffers with size 8192 with valueCount =
1025, memory usage is almost double what it actually is. So in some cases,
there have enough memory for actual use but throws OOM when the allocated
memory is increased to next power of 2 and I think this problem is absolutely
avoidable.
Is it feasible to add option for ValueVector to allocate actual buffer size
rather than make it next power of 2 to reduce memory allocation?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)