tustvold commented on issue #3142:
URL: https://github.com/apache/arrow-rs/issues/3142#issuecomment-1321842532

   Like the general idea, just a couple of comments/questions:
   
   * If the extend payload is RecordBatch, what do you gain by concatenating 
them together? Why not just store them separately and periodically compact 
them? What do you gain from a single RecordBatch over sayb`Vec<RecordBatch>`?
   * Similar to the above, why is this an AppendableRecordBatch and not say 
AppendablePrimitiveArray, etc... This would be more flexible and avoid creating 
array temporaries
   * I'm not sure how support for booleans would work, unless you can only 
append multiples of 8
   * You need to know the maximum buffer lengths up front, as you can't realloc 
the buffers
   * Your comment suggests arrow2 supports this but can't see how, could you 
point me to it?
   
   
   One potentially simpler way to implement something similar to this would be 
to add a non-consuming finish method to the builders. This would entail copying 
the buffers, but in my experience of implementing the write path for IOx, this 
copy is insignificant in the grand scheme of query execution - even ignoring 
the heavy hitters like sorts and groups, all kernels involve copying values to 
an output array. This is especially true if after a certain number of rows you 
rotate the builders and just keep the immutable RecordBatch, thereby bounding 
the copy. What do you think?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to