I think having a chunked array with multiple vector buffers would be ideal, similar to C++. It might take a fair amount of work to add this but would open up a lot more functionality. As for the API, VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) seems good to me.
On Thu, Nov 7, 2019 at 12:09 AM Fan Liya <liya.fa...@gmail.com> wrote: > Hi Micah, > > Thanks for bringing this up. > > > 1. An efficient solution already exists? It seems like TransferPair > implementations could possibly be improved upon or have they already been > optimized? > > Fundamnentally, memory copy is unavoidable, IMO, because the source and > targe memory regions are likely to be in non-contiguous regions. > An alternative is to make ArrowBuf support a number of non-contiguous > memory regions. However, that would harm the perfomance of ArrowBuf, and > ArrowBuf is the core of the Arrow library. > > > 2. What the preferred API for doing this would be? Some options i can > think of: > > > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > > * VectorLoader.load(Collection<ArrowRecordBatch>) > > IMO, option 1 is required, as we have scenarios that need to concate > vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta > dictionaries). > Options 2 and 3 are optional for us. > > Best, > Liya Fan > > On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > Hi, > > A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048 > for > > having similar functionality to the python APIs that allow for creating > one > > larger data structure from a series of record batches. I just wanted to > > surface it here in case: > > 1. An efficient solution already exists? It seems like TransferPair > > implementations could possibly be improved upon or have they already been > > optimized? > > 2. What the preferred API for doing this would be? Some options i can > > think of: > > > > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > > * VectorLoader.load(Collection<ArrowRecordBatch>) > > > > Thanks, > > Micah > > >