mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing 
its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of 
https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, 
self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. 
we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need 
to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine.  However, in my 
benchmarking I still found that using `SetLenOnDrop` provided a small 
performance benefit compared to just updating `self.len` after the loop.  I'm 
not sure why.  Maybe the mutable borrow provides some additional hints to the 
optimizer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to