mbrubeck commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323
########## File path: rust/arrow/src/buffer.rs ########## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed. #[inline] - pub fn extend(&mut self, additional: usize) { + pub fn extend_zeros(&mut self, additional: usize) { self.resize(self.len + additional, 0); } } +/// # Safety +/// `ptr` must be allocated for `old_capacity`. +#[inline] +unsafe fn reallocate( + ptr: NonNull<u8>, + old_capacity: usize, + new_capacity: usize, +) -> (NonNull<u8>, usize) { + let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity); + let new_capacity = std::cmp::max(new_capacity, old_capacity * 2); + let ptr = memory::reallocate(ptr, old_capacity, new_capacity); + (ptr, new_capacity) +} + +impl<A: ArrowNativeType> Extend<A> for MutableBuffer { + #[inline] + fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) { + let iterator = iter.into_iter(); + self.extend_from_iter(iterator) + } +} + +impl MutableBuffer { + #[inline] + fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>( + &mut self, + mut iterator: I, + ) { + let size = std::mem::size_of::<T>(); + + // this is necessary because of https://github.com/rust-lang/rust/issues/32155 + let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len); + let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T }; + + while let Some(item) = iterator.next() { + if len + size >= capacity { + let (lower, _) = iterator.size_hint(); + let additional = (lower + 1) * size; + let (new_ptr, new_capacity) = + unsafe { reallocate(ptr, capacity, len + additional) }; Review comment: > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`? No. Dropping before `self.len` is updated should be fine. However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop. I'm not sure why. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org