Ji Liu created ARROW-5259:
-----------------------------

             Summary: Add option for ValueVector to allocate buffers with 
actual size
                 Key: ARROW-5259
                 URL: https://issues.apache.org/jira/browse/ARROW-5259
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Ji Liu
            Assignee: Ji Liu


Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the 
buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates 
memory for dataBuffer and validityBuffer. However, it always allocate memory 
greater than the actual size, because of the invoke of 
_BaseAllocator.nextPowerOfTwo(bufferSize)_.

For example, IntVector will allocate buffers with size 8192 with valueCount = 
1025, memory usage is almost double what it actually is. So in some cases, 
there have enough memory for actual use but throws OOM when the allocated 
memory is increased to next power of 2 and I think this problem is absolutely 
avoidable.

Is it feasible to add option for ValueVector to allocate actual buffer size 
rather than make it next power of 2 to reduce memory allocation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to