houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -577,6 +632,81 @@ where
         self
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a 
builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        if !check_array_data_type(&self.data_type(), data) {
+            return Err(ArrowError::InvalidArgumentError(
+                "Cannot append data to builder if data types are 
different".to_string(),
+            ));
+        }
+        // determine the latest offset on the builder
+        let mut cum_offset = if self.offsets_builder.len() == 0 {
+            0
+        } else {
+            // peek into buffer to get last appended offset
+            let buffer = self.offsets_builder.buffer.data();
+            let len = self.offsets_builder.len();
+            let (start, end) = ((len - 1) * 4, len * 4);
+            let slice = &buffer[start..end];
+            i32::from_le_bytes(slice.try_into().unwrap())
+        };
+        for array in data {
+            if array.child_data().len() != 1 {

Review comment:
       If we are already doing validation before mutating data (which I think 
it's the right behavior), it's best to move all validation logic into the 
initial validation loop.
   
   Another thing we can move into the initial loop is stats gathering for 
optimization purpose. For example, we can calculate aggregated element count in 
the loop. Then before we enter the mutation loop, call reserve method on 
various builders to allocate memory in one go. This way, we don't have to 
trigger multiple memory reallocation in the mutation loop.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to