nevi-me commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436354577



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -577,6 +632,81 @@ where
         self
     }
 
+    /// Appends data from other arrays into the builder
+    ///
+    /// This is most useful when concatenating arrays of the same type into a 
builder.
+    fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> {
+        if !check_array_data_type(&self.data_type(), data) {
+            return Err(ArrowError::InvalidArgumentError(
+                "Cannot append data to builder if data types are 
different".to_string(),
+            ));
+        }
+        // determine the latest offset on the builder
+        let mut cum_offset = if self.offsets_builder.len() == 0 {
+            0
+        } else {
+            // peek into buffer to get last appended offset
+            let buffer = self.offsets_builder.buffer.data();
+            let len = self.offsets_builder.len();
+            let (start, end) = ((len - 1) * 4, len * 4);
+            let slice = &buffer[start..end];
+            i32::from_le_bytes(slice.try_into().unwrap())
+        };
+        for array in data {
+            if array.child_data().len() != 1 {

Review comment:
       > But I do think we should make sure error handling behavior is 
consistent, i.e. invalid input should not lead to partial append to the array. 
If we are to use ArrayDataRef as input type, then I feel like we do need to 
have custom validation logic for each array type as you mentioned.
   
   Thanks, I agree. Changes made in my latest commit.
   
   > The idea of using ArrayRef as input type is interesting. It does look like 
a simpler interface for end users and could simplify the error handling logic. 
   
   The preferred way to convert `&[ArrayRef]` to `ArrayRef` will be the 
`concat` kernel that you've added. concat might then forego its current 
validation, and potentially be like:
   
   ```rust
   pub fn concat(array_list: &[ArrayRef]) -> Result<ArrayRef> {
       // get data type from first element
       // create builder for data type (this'll have to cater for structs and 
lists)
       // pass `ArrayDataRef`to builder
       // finish builder and return `ArrayRef`
   }
   ```
   
   > What's the downside of using ArrayRef here compared to ArrayDataRef?
   
   `ArrayDataRef` is more flexible. If someone is creating Arrow data from raw 
data, there currently isn't much flexibility for them, especially when working 
with nested data structures. It might be more convenient to then create 
`ArrayData` instead of going all the way to create an array only to append it 
to a builder.
   Constructing an `ArrayRef` to append is an extra step and at worst requires 
going through `arrow::utils::make_array(data: ArrayRef)`.
   
   The upside of `ArrayRef` is skipping the validation checks, though I wonder 
what cost the checks result in. We can wait for other reviewers' opinions on 
their necessity.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to