I think having a chunked array with multiple vector buffers would be ideal,
similar to C++. It might take a fair amount of work to add this but would
open up a lot more functionality. As for the API,
VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) seems good to me.

On Thu, Nov 7, 2019 at 12:09 AM Fan Liya <liya.fa...@gmail.com> wrote:

> Hi Micah,
>
> Thanks for bringing this up.
>
> > 1.  An efficient solution already exists? It seems like TransferPair
> implementations could possibly be improved upon or have they already been
> optimized?
>
> Fundamnentally, memory copy is unavoidable, IMO, because the source and
> targe memory regions are likely to be in non-contiguous regions.
> An alternative is to make ArrowBuf support a number of non-contiguous
> memory regions. However, that would harm the perfomance of ArrowBuf, and
> ArrowBuf is the core of the Arrow library.
>
> > 2.  What the preferred API for doing this would be?  Some options i can
> think of:
>
> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
> > * VectorLoader.load(Collection<ArrowRecordBatch>)
>
> IMO, option 1 is required, as we have scenarios that need to concate
> vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta
> dictionaries).
> Options 2 and 3 are optional for us.
>
> Best,
> Liya Fan
>
> On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > Hi,
> > A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048
> for
> > having similar functionality to the python APIs that allow for creating
> one
> > larger data structure from a series of record batches.  I just wanted to
> > surface it here in case:
> > 1.  An efficient solution already exists? It seems like TransferPair
> > implementations could possibly be improved upon or have they already been
> > optimized?
> > 2.  What the preferred API for doing this would be?  Some options i can
> > think of:
> >
> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
> > * VectorLoader.load(Collection<ArrowRecordBatch>)
> >
> > Thanks,
> > Micah
> >
>

Reply via email to