Le 25/08/2022 à 19:01, Larry White a écrit :
Hi all,

Thank you, Antoine and everyone for the feedback. It's been very helpful.
The proposal has been updated to incorporate suggested changes and clarify
as needed.

Several people have expressed support for the idea of using a Java version
of ChunkedArrays as the internal representation. I'm wondering if a
complete implementation of ChunkedArray is needed to achieve the
performance benefits that you mention in this thread. In my reading of the
API, data streamed as RecordBatches are converted to ChunkedArrays in a
One-RecordBatch-to-One-ChunkedArray fashion.  This suggests that the
complexity of managing chunks of different shapes isn't strictly required.
Is that your understanding?.

Yes, it is right. The ability to have chunks of different shapes is a C++ design decision, but it doesn't affect other implementations. So instead you could eschew ChunkedArray and have a Table be a sequence of record batches, for example.

Regards

Antoine.

Reply via email to