Hi Ji,

IMO, for the correct order, the validity buffer should precede the offset
buffer (e.g. this is the order used by BaseVariableWidthVector &
BaseLargeVariableWidthVector).
In ListVector#getBuffers, the offset buffer precedes the validity buffer,
so I am a little confused why you say the order of ListVector#getBuffers is
right?

Best,
Liya Fan

On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> FWIW, I lack historical context on how these methods evolved, so I'd
> appreciate insight from anyone who has worked on the java codebase for a
> longer period of time.  The current situation seems less then ideal.
>
> On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote:
>
> > Hi all,
> >
> >
> > When I worked on ARROW-7539[1], I met some problems and not sure what's
> the
> > proper way to solve it.
> >
> >
> > This issue was about to avoid set reader/writer indices in
> > FieldVector#getFieldBuffers according to the following reasons:
> >
> > i. getBuffers set reader/writer indices and it's right for the purpose of
> > sending the data over the wire
> >
> > ii. getFieldBuffers is distinct from getBuffers, it should be for getting
> > access to underlying data for higher-performance algorithms
> >
> >
> > Currently in VectorUnloader, we used getFieldBuffers to create
> > ArrowRecordBatch that's why we keep writer/reader indices in
> > getFieldBuffers
> > , we should use getBuffers instead. But during the change, we found
> another
> > problem:
> >
> > The order of validity and offset buffers are not in the same order in
> > ListVector(getBuffers's order is right), changing the API in
> VectorUnloader
> > creates problems with serialization/deserialization resulting in test
> > failures in Dremio which would break backward compatibility with existing
> > serialised files.
> >
> >
> > Micah gives a solution but seems doesn't reach consistent in the PR
> > thread[2
> > ]:
> >
> >    1. Remove setReaderWriterIndeces in getFieldBuffers
> >    2. Deprecate getBuffers
> >    3. Introduce a new getIpcBuffers which is unambiguously used for
> writing
> >    record batches (i.e. in VectorUnloader).
> >    4. Update documentation where it makes sense based on all this
> >    conversation.
> >
> >
> > More details and discussions can be seen from the PR and hope to get more
> > feedback.
> >
> >
> >
> > Thanks,
> >
> > Ji Liu
> >
> >
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-7539
> >
> > [2] https://github.com/apache/arrow/pull/6156
> >
>

Reply via email to