houqp commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311
########## File path: rust/arrow/src/array/builder.rs ########## @@ -577,6 +632,81 @@ where self } + /// Appends data from other arrays into the builder + /// + /// This is most useful when concatenating arrays of the same type into a builder. + fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { + if !check_array_data_type(&self.data_type(), data) { + return Err(ArrowError::InvalidArgumentError( + "Cannot append data to builder if data types are different".to_string(), + )); + } + // determine the latest offset on the builder + let mut cum_offset = if self.offsets_builder.len() == 0 { + 0 + } else { + // peek into buffer to get last appended offset + let buffer = self.offsets_builder.buffer.data(); + let len = self.offsets_builder.len(); + let (start, end) = ((len - 1) * 4, len * 4); + let slice = &buffer[start..end]; + i32::from_le_bytes(slice.try_into().unwrap()) + }; + for array in data { + if array.child_data().len() != 1 { Review comment: If we are already doing validation before mutating data (which I think it's the right behavior), it's best to move all validation logic into the initial validation loop. Another thing we can move into the initial loop is stats gathering for optimization purpose. For example, we can calculate aggregated element count in the loop. Then before we enter the mutation loop, call reserve method on various builders to allocate memory in one go. This way, we don't have to trigger multiple memory reallocation in the mutation loop, especially for bitmap_builder. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org