Hi Ji, IMO, for the correct order, the validity buffer should precede the offset buffer (e.g. this is the order used by BaseVariableWidthVector & BaseLargeVariableWidthVector). In ListVector#getBuffers, the offset buffer precedes the validity buffer, so I am a little confused why you say the order of ListVector#getBuffers is right?
Best, Liya Fan On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > FWIW, I lack historical context on how these methods evolved, so I'd > appreciate insight from anyone who has worked on the java codebase for a > longer period of time. The current situation seems less then ideal. > > On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote: > > > Hi all, > > > > > > When I worked on ARROW-7539[1], I met some problems and not sure what's > the > > proper way to solve it. > > > > > > This issue was about to avoid set reader/writer indices in > > FieldVector#getFieldBuffers according to the following reasons: > > > > i. getBuffers set reader/writer indices and it's right for the purpose of > > sending the data over the wire > > > > ii. getFieldBuffers is distinct from getBuffers, it should be for getting > > access to underlying data for higher-performance algorithms > > > > > > Currently in VectorUnloader, we used getFieldBuffers to create > > ArrowRecordBatch that's why we keep writer/reader indices in > > getFieldBuffers > > , we should use getBuffers instead. But during the change, we found > another > > problem: > > > > The order of validity and offset buffers are not in the same order in > > ListVector(getBuffers's order is right), changing the API in > VectorUnloader > > creates problems with serialization/deserialization resulting in test > > failures in Dremio which would break backward compatibility with existing > > serialised files. > > > > > > Micah gives a solution but seems doesn't reach consistent in the PR > > thread[2 > > ]: > > > > 1. Remove setReaderWriterIndeces in getFieldBuffers > > 2. Deprecate getBuffers > > 3. Introduce a new getIpcBuffers which is unambiguously used for > writing > > record batches (i.e. in VectorUnloader). > > 4. Update documentation where it makes sense based on all this > > conversation. > > > > > > More details and discussions can be seen from the PR and hope to get more > > feedback. > > > > > > > > Thanks, > > > > Ji Liu > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-7539 > > > > [2] https://github.com/apache/arrow/pull/6156 > > >