hi liya, Thanks for your careful review, it is a typo, the order of getBuffers is wrong.
Fan Liya <liya.fa...@gmail.com> 于2020年8月5日周三 下午2:14写道: > Hi Ji, > > IMO, for the correct order, the validity buffer should precede the offset > buffer (e.g. this is the order used by BaseVariableWidthVector & > BaseLargeVariableWidthVector). > In ListVector#getBuffers, the offset buffer precedes the validity buffer, > so I am a little confused why you say the order of ListVector#getBuffers is > right? > > Best, > Liya Fan > > On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > FWIW, I lack historical context on how these methods evolved, so I'd > > appreciate insight from anyone who has worked on the java codebase for a > > longer period of time. The current situation seems less then ideal. > > > > On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote: > > > > > Hi all, > > > > > > > > > When I worked on ARROW-7539[1], I met some problems and not sure what's > > the > > > proper way to solve it. > > > > > > > > > This issue was about to avoid set reader/writer indices in > > > FieldVector#getFieldBuffers according to the following reasons: > > > > > > i. getBuffers set reader/writer indices and it's right for the purpose > of > > > sending the data over the wire > > > > > > ii. getFieldBuffers is distinct from getBuffers, it should be for > getting > > > access to underlying data for higher-performance algorithms > > > > > > > > > Currently in VectorUnloader, we used getFieldBuffers to create > > > ArrowRecordBatch that's why we keep writer/reader indices in > > > getFieldBuffers > > > , we should use getBuffers instead. But during the change, we found > > another > > > problem: > > > > > > The order of validity and offset buffers are not in the same order in > > > ListVector(getBuffers's order is right), changing the API in > > VectorUnloader > > > creates problems with serialization/deserialization resulting in test > > > failures in Dremio which would break backward compatibility with > existing > > > serialised files. > > > > > > > > > Micah gives a solution but seems doesn't reach consistent in the PR > > > thread[2 > > > ]: > > > > > > 1. Remove setReaderWriterIndeces in getFieldBuffers > > > 2. Deprecate getBuffers > > > 3. Introduce a new getIpcBuffers which is unambiguously used for > > writing > > > record batches (i.e. in VectorUnloader). > > > 4. Update documentation where it makes sense based on all this > > > conversation. > > > > > > > > > More details and discussions can be seen from the PR and hope to get > more > > > feedback. > > > > > > > > > > > > Thanks, > > > > > > Ji Liu > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-7539 > > > > > > [2] https://github.com/apache/arrow/pull/6156 > > > > > >