I opened a jira to describe what I think needs to be done here. Check it out:
https://issues.apache.org/jira/browse/ARROW-3191 On Fri, Sep 7, 2018 at 10:47 AM Wes McKinney <wesmck...@gmail.com> wrote: > Seems like you should be able to construct an UnsafeDirectByteBuf from > a MappedByteBuffer, and then wrap that with UnsafeDirectLittleEndian > to get zero-copy access to a memory map. Does that sound right? > > > https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java > On Fri, Sep 7, 2018 at 12:46 PM Zhenyuan Zhao <zzy...@gmail.com> wrote: > > > > Interesting, so basically I can still use the public constructor > > > > public ArrowBuf(AtomicInteger refCnt, BufferLedger ledger, > > UnsafeDirectLittleEndian byteBuf, BufferManager manager, > > ArrowByteBufAllocator alloc, int offset, int length, boolean isEmpty) > > > > Instead, override BufferLedger/UnsafeDirectLittleEndian/BufferManager to > > make it reference existing buffer. That is a much more plausible option > as > > it will reuse the Vectors. All I need is to implement my own > deserializer. > > Did I get you right? > > > > Thanks > > > > On Fri, Sep 7, 2018 at 7:09 AM Jacques Nadeau <jacq...@apache.org> > wrote: > > > > > It is on purpose that the ArrowBuf is final. It is done to ensure a > single > > > impl and performance reasons. ArrowBuf is primarily a memory address > and a > > > length and wants zero indirection to the reading/writing of that. > > > > > > It does, however, wrap several types of substructures as long as they > have > > > that property. For example, an ArrowBuf almost always currently wraps a > > > Netty UnsafeDirectLittleEndian object. At that level you could propose > a > > > way to wrap more types of memory addresses+lengths. > > > > > > On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <zzy...@gmail.com> wrote: > > > > > > > Hello Team, > > > > > > > > I'm working on using arrow as intermediate format for transferring > > > columnar > > > > data from server to client. In this case, the client will only need > to > > > read > > > > from the format so I would like to avoid any unnecessary copy of the > > > data. > > > > Looking into arrow, while arrow-format/flatbuffers does support zero > > > copy, > > > > current arrow-vector java implementation is not. I was trying to hack > > > zero > > > > copy for readonly scenarios, but saw two main blockers: > > > > > > > > 1. > > > > > > > > ArrowBuf is the only buffer implementation used exclusively across > > > > ArrowReader/ArrowRecordBatch/Vectors. It's final, which means > there > > > > isn't a > > > > way for me to override its logic in order to wrap some existing > > > buffer. > > > > It's absolutely necessary to use ArrowBuf for write scenarios due > to > > > > buffer > > > > allocation, but for read, I was hoping vector can just serve as > view > > > on > > > > top > > > > of existing memory buffer (like java ByteBuffer or netty ByteBuf). > > > Seems > > > > safe for read only case. > > > > 2. > > > > > > > > As a result of #1 <https://github.com/apache/arrow/pull/1> > described > > > > above, the only layer which seems reusable is the arrow-format. > Then I > > > > have > > > > to implement effectively a readonly copy of arrow-vector that > > > references > > > > existing buffer. Put aside the effort doing that, it introduces a > big > > > > gap > > > > to keep up with future changes/fixes made to arrow-vector. > > > > > > > > Wondering if you guys have put any thoughts into such readonly > scenarios. > > > > Any suggestion how I can approach this myself? > > > > > > > > Thanks > > > > > > > >