alamb commented on code in PR #7614:
URL: https://github.com/apache/arrow-rs/pull/7614#discussion_r2132374323
##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
let b = b.get_unchecked(start..end);
let view = make_view(b, block, offset);
- self.views_builder.append(view);
+ self.views_buffer.push(view);
self.null_buffer_builder.append_non_null();
}
+ /// Appends an array to the builder.
+ /// This will flush any in-progress block and append the data buffers
+ /// and add the (adapted) views.
+ pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+ self.flush_in_progress();
+ self.completed.extend(array.data_buffers().iter().cloned());
+
+ if self.completed.is_empty() {
Review Comment:
This checks `completed.is_empty` *after* pushing new data buffers, which I
think means the fast path will never be taken. I think the check could be done
prior to calling `self.completed.extend` and improve performance
##########
arrow-select/src/concat.rs:
##########
@@ -84,6 +86,22 @@ fn fixed_size_list_capacity(arrays: &[&dyn Array],
data_type: &DataType) -> Capa
}
}
+fn concat_byte_view(arrays: &[&dyn Array]) -> Result<ArrayRef, ArrowError> {
Review Comment:
Very minor is that you could make this generic (ByteViewType) rather than
explicitly have two functions.
##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
let b = b.get_unchecked(start..end);
let view = make_view(b, block, offset);
- self.views_builder.append(view);
+ self.views_buffer.push(view);
self.null_buffer_builder.append_non_null();
}
+ /// Appends an array to the builder.
+ /// This will flush any in-progress block and append the data buffers
+ /// and add the (adapted) views.
+ pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+ self.flush_in_progress();
+ self.completed.extend(array.data_buffers().iter().cloned());
+
+ if self.completed.is_empty() {
+ self.views_buffer.extend_from_slice(array.views());
+ } else {
+ let starting_buffer = self.completed.len() as u32;
+
+ self.views_buffer.extend(array.views().iter().map(|v| {
+ let mut byte_view = ByteView::from(*v);
+ if byte_view.length > 12 {
+ // If the view is small enough, we can inline it
Review Comment:
```suggestion
// Small views (<=12 bytes) are inlined, so only need to
update large views
```
##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -79,7 +79,7 @@ impl BlockSizeGrowthStrategy {
/// using [`GenericByteViewBuilder::append_block`] and then views into this
block appended
/// using [`GenericByteViewBuilder::try_append_view`]
pub struct GenericByteViewBuilder<T: ByteViewType + ?Sized> {
- views_builder: BufferBuilder<u128>,
+ views_buffer: Vec<u128>,
Review Comment:
this is a great idea 💯
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]