One use-case for ChunkedArray that comes to my mind is external sort for large vectors.
Best, Liya Fan On Fri, Nov 15, 2019 at 2:14 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > Maybe Java can add the concept of Tables and ChunkedArrays sometime in > the > > future. > > > Is there a concrete use-case here? It might pay to open up some JIRAs. > I'm still not 100% clear on the rationale for the way VectorSchemaRoot is > designed and how that would relate to Table/ChunkedArrays (or maybe they > are completely separate)? > > On Tue, Nov 12, 2019 at 11:28 AM Bryan Cutler <cutl...@gmail.com> wrote: > > > Yes, you are correct. I think I was mixing up a couple different things. > I > > like the way C++/Python distinguishes it where a RecordBatch is > contiguous > > memory and a Table can be chunked. So since you are just talking about > > RecordBatches, I think we should keep it contiguous and concat would > > require memcpy. Maybe Java can add the concept of Tables and > ChunkedArrays > > sometime in the future. > > > > On Mon, Nov 11, 2019 at 9:59 AM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> I think having a chunked array with multiple vector buffers would be > >>> ideal, similar to C++. It might take a fair amount of work to add this > but > >>> would open up a lot more functionality. > >> > >> > >> There are potentially two different use-cases. ChunkedArray is > >> logical/lazy concatenation where as concat, physically rebuilds the > vectors > >> to be a single vector. > >> > >> On Fri, Nov 8, 2019 at 10:51 AM Bryan Cutler <cutl...@gmail.com> wrote: > >> > >>> I think having a chunked array with multiple vector buffers would be > >>> ideal, similar to C++. It might take a fair amount of work to add this > but > >>> would open up a lot more functionality. As for the API, > >>> VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) seems good to me. > >>> > >>> On Thu, Nov 7, 2019 at 12:09 AM Fan Liya <liya.fa...@gmail.com> wrote: > >>> > >>>> Hi Micah, > >>>> > >>>> Thanks for bringing this up. > >>>> > >>>> > 1. An efficient solution already exists? It seems like TransferPair > >>>> implementations could possibly be improved upon or have they already > >>>> been > >>>> optimized? > >>>> > >>>> Fundamnentally, memory copy is unavoidable, IMO, because the source > and > >>>> targe memory regions are likely to be in non-contiguous regions. > >>>> An alternative is to make ArrowBuf support a number of non-contiguous > >>>> memory regions. However, that would harm the perfomance of ArrowBuf, > and > >>>> ArrowBuf is the core of the Arrow library. > >>>> > >>>> > 2. What the preferred API for doing this would be? Some options i > >>>> can > >>>> think of: > >>>> > >>>> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > >>>> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > >>>> > * VectorLoader.load(Collection<ArrowRecordBatch>) > >>>> > >>>> IMO, option 1 is required, as we have scenarios that need to concate > >>>> vectors/VectorSchemaRoots (e.g. restore the complete dictionary from > >>>> delta > >>>> dictionaries). > >>>> Options 2 and 3 are optional for us. > >>>> > >>>> Best, > >>>> Liya Fan > >>>> > >>>> On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com > > > >>>> wrote: > >>>> > >>>> > Hi, > >>>> > A colleague opened up > >>>> https://issues.apache.org/jira/browse/ARROW-7048 for > >>>> > having similar functionality to the python APIs that allow for > >>>> creating one > >>>> > larger data structure from a series of record batches. I just > wanted > >>>> to > >>>> > surface it here in case: > >>>> > 1. An efficient solution already exists? It seems like TransferPair > >>>> > implementations could possibly be improved upon or have they already > >>>> been > >>>> > optimized? > >>>> > 2. What the preferred API for doing this would be? Some options i > >>>> can > >>>> > think of: > >>>> > > >>>> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > >>>> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > >>>> > * VectorLoader.load(Collection<ArrowRecordBatch>) > >>>> > > >>>> > Thanks, > >>>> > Micah > >>>> > > >>>> > >>> >