Thanks Siddharth. Your explanation is very helpful.
It's a bit strange though that reset() doesn't release the current ArrowBuf
though. This means the the size of underlying ArrowBuf doesn't change until
the next realloc/allocateNew is called. This could cause the Vector holds
more memory than needed after reset(). I suppose this is some trade off
between performance of reset() vs memory efficiency?
> However, I think reset() method should instead do allocationSizeInBytes =
> INITIAL_VALUE_ALLOCATION * ${type_width} to reset to the actual initial
> value.
I agree this seems to be an issue. I created
https://issues.apache.org/jira/browse/ARROW-1296.
Thanks again,
Li
On Fri, Jul 28, 2017 at 6:59 PM, Siddharth Teotia <[email protected]>
wrote:
> Hi Li
>
> For FixedValueVectors.java template, the initial allocation will happen
> based on the value of allocationSizeInBytes. For example, for a 4 byte
> IntVector, this will be 16KB of memory which is equivalent to
> INITIAL_VALUE_ALLOCATION * ${type_width} in the code. So if the user
> invokes fixed_vector.allocateNew(), it will try to allocate memory based on
> this value of allocationSizeInBytes.
>
> The functions allocateNew(), allocateNewSafe(), realloc() consume the
> current value of allocationSizeInBytes and decide the actual size of memory
> to allocate (or re-allocate). They also change the value of
> allocationSizeInBytes after the allocation (or re-allocation) has been done
> successfully.
>
> The reset() function does allocationSizeInBytes = INITIAL_VALUE_ALLOCATION
> so that subsequent usage of alloc/realloc functions can start from this
> base-value (value at the time vector was instantiated)
>
> However, I think reset() method should instead do allocationSizeInBytes =
> INITIAL_VALUE_ALLOCATION * ${type_width} to reset to the actual initial
> value.
>
> For NullableValueVectors.java template, these vectors actually delegate
> most of the calls to the underlying bit vector and value vector (which
> could be fixed-width or variable width). For this reason, I think the
> reset() method should call values.reset() on the corresponding value
> vector. Right now it resets only the bit vector.
>
> I hope this answer some of your questions.
>
> Thanks
> Siddharth
>
> On Fri, Jul 28, 2017 at 8:55 AM, Li Jin <[email protected]> wrote:
>
> > Hi All,
> >
> > I encountered this weirdness in Arrow Java codebase that I hope someone
> can
> > help me understand.
> >
> > This reset method of FixedValueVectors sets the allocation size to
> > INITIAL_VALUE_ALLOCATION.
> > I am wondering why it does that and how does it handle the case where the
> > vector is expanded through realloc.
> >
> > https://github.com/apache/arrow/blob/master/java/vector/
> > src/main/codegen/templates/FixedValueVectors.java#L165
> >
> > For comparison, reset() in NullableValueVectors doesn't do that:
> >
> > https://github.com/apache/arrow/blob/master/java/vector/
> > src/main/codegen/templates/NullableValueVectors.java#L285
> >
> > Appreciate the help!
> >
> > Li
> >
>