hi liya,
Thanks for your careful review, it is a typo, the order of getBuffers is
wrong.

Fan Liya <liya.fa...@gmail.com> 于2020年8月5日周三 下午2:14写道:

> Hi Ji,
>
> IMO, for the correct order, the validity buffer should precede the offset
> buffer (e.g. this is the order used by BaseVariableWidthVector &
> BaseLargeVariableWidthVector).
> In ListVector#getBuffers, the offset buffer precedes the validity buffer,
> so I am a little confused why you say the order of ListVector#getBuffers is
> right?
>
> Best,
> Liya Fan
>
> On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > FWIW, I lack historical context on how these methods evolved, so I'd
> > appreciate insight from anyone who has worked on the java codebase for a
> > longer period of time.  The current situation seems less then ideal.
> >
> > On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > >
> > > When I worked on ARROW-7539[1], I met some problems and not sure what's
> > the
> > > proper way to solve it.
> > >
> > >
> > > This issue was about to avoid set reader/writer indices in
> > > FieldVector#getFieldBuffers according to the following reasons:
> > >
> > > i. getBuffers set reader/writer indices and it's right for the purpose
> of
> > > sending the data over the wire
> > >
> > > ii. getFieldBuffers is distinct from getBuffers, it should be for
> getting
> > > access to underlying data for higher-performance algorithms
> > >
> > >
> > > Currently in VectorUnloader, we used getFieldBuffers to create
> > > ArrowRecordBatch that's why we keep writer/reader indices in
> > > getFieldBuffers
> > > , we should use getBuffers instead. But during the change, we found
> > another
> > > problem:
> > >
> > > The order of validity and offset buffers are not in the same order in
> > > ListVector(getBuffers's order is right), changing the API in
> > VectorUnloader
> > > creates problems with serialization/deserialization resulting in test
> > > failures in Dremio which would break backward compatibility with
> existing
> > > serialised files.
> > >
> > >
> > > Micah gives a solution but seems doesn't reach consistent in the PR
> > > thread[2
> > > ]:
> > >
> > >    1. Remove setReaderWriterIndeces in getFieldBuffers
> > >    2. Deprecate getBuffers
> > >    3. Introduce a new getIpcBuffers which is unambiguously used for
> > writing
> > >    record batches (i.e. in VectorUnloader).
> > >    4. Update documentation where it makes sense based on all this
> > >    conversation.
> > >
> > >
> > > More details and discussions can be seen from the PR and hope to get
> more
> > > feedback.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Ji Liu
> > >
> > >
> > >
> > > [1] https://issues.apache.org/jira/browse/ARROW-7539
> > >
> > > [2] https://github.com/apache/arrow/pull/6156
> > >
> >
>

Reply via email to