[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
nevi-me commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436308395 ## File path: rust/arrow/src/array/builder.rs ## @@ -841,12 +1048,91 @@ impl ArrayBuilder for StringBuilder { } } +// Helper function for appending Binary and Utf8 data +fn append_binary_data( +builder: &mut ListBuilder, +data_type: &DataType, +data: &[ArrayDataRef], +) -> Result<()> { +if !check_array_data_type(data_type, data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +for array in data { +// convert string to List to reuse list's cast +let int_data = &array.buffers()[1]; +let int_data = Arc::new(ArrayData::new( +DataType::UInt8, +int_data.len(), +None, +None, +0, +vec![int_data.clone()], +vec![], +)) as ArrayDataRef; +let list_data = Arc::new(ArrayData::new( +DataType::List(Box::new(DataType::UInt8)), +array.len(), +None, +array.null_buffer().map(|buf| buf.clone()), +array.offset(), +vec![(&array.buffers()[0]).clone()], +vec![int_data], +)); +builder.append_data(&[list_data])?; +} +Ok(()) +} + impl ArrayBuilder for FixedSizeBinaryBuilder { /// Returns the builder as a non-mutable `Any` reference. fn as_any(&self) -> &Any { self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +for array in data { +// convert string to FixedSizeList to reuse list's append +let int_data = &array.buffers()[0]; +let int_data = Arc::new(ArrayData::new( +DataType::UInt8, +int_data.len(), +None, +None, +0, +vec![int_data.clone()], +vec![], +)) as ArrayDataRef; +let list_data = Arc::new(ArrayData::new( +DataType::FixedSizeList(Box::new(DataType::UInt8), self.builder.list_len), Review comment: I don't understand you entirely here, mind clarifying? Here I'm reconstructing the data as a FSList from FSBinary, otherwise I'd get an error about data types not being the same. I used list_len directly because i32 is a Copy type. I can change to builder.value_length() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
nevi-me commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436307832 ## File path: rust/arrow/src/array/builder.rs ## @@ -577,6 +632,81 @@ where self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +// determine the latest offset on the builder +let mut cum_offset = if self.offsets_builder.len() == 0 { +0 +} else { +// peek into buffer to get last appended offset +let buffer = self.offsets_builder.buffer.data(); +let len = self.offsets_builder.len(); +let (start, end) = ((len - 1) * 4, len * 4); +let slice = &buffer[start..end]; +i32::from_le_bytes(slice.try_into().unwrap()) +}; +for array in data { +if array.child_data().len() != 1 { Review comment: The validation is only for data type, so we'd have to make a call on whether passing array data that's invalid should be undefined behaviour. If we passed in ArrayRef, we'd be certain that data is valid, but otherwise nothing stops someone from manually constructing ArrayDataRef incorrectly and passing it in. The validation check here at least give the user feedback, otherwise it would be a generic bounds error. I could alternatively customise the validation for different types, with potential allocation for both value and bitmap builders for primitive arrays. It becomes a slippery slope for lists and structs because those can be deeply nested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache
kou closed pull request #7366: URL: https://github.com/apache/arrow/pull/7366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache
kou commented on pull request #7366: URL: https://github.com/apache/arrow/pull/7366#issuecomment-640129138 +1 In "C GLib & Ruby / AMD64 Windows MinGW 64 GLib & Ruby": 6min -> 1min In "C++ / AMD64 Windows MinGW 32 C++": 18min -> 2min In "C++ / AMD64 Windows MinGW 64 C++": 18min -> 2min This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings
nealrichardson commented on a change in pull request #7318: URL: https://github.com/apache/arrow/pull/7318#discussion_r436304103 ## File path: r/R/chunked-array.R ## @@ -75,23 +75,15 @@ ChunkedArray <- R6Class("ChunkedArray", inherit = ArrowObject, if (is.integer(i)) { i <- Array$create(i) } - # Invalid: Kernel does not support chunked array arguments - # so use the old method for both cases - if (inherits(i, "ChunkedArray")) { -return(shared_ptr(ChunkedArray, ChunkedArray__TakeChunked(self, i))) - } - assert_is(i, "Array") - return(shared_ptr(ChunkedArray, ChunkedArray__Take(self, i))) + # Invalid: Tried executing function with non-value type: ChunkedArray Review comment: Done in https://github.com/apache/arrow/pull/7308/commits/247a1048e85871214e0456f3ee347a9b3e5388da This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
houqp commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436303006 ## File path: rust/arrow/src/array/builder.rs ## @@ -841,12 +1048,91 @@ impl ArrayBuilder for StringBuilder { } } +// Helper function for appending Binary and Utf8 data +fn append_binary_data( +builder: &mut ListBuilder, +data_type: &DataType, +data: &[ArrayDataRef], +) -> Result<()> { +if !check_array_data_type(data_type, data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +for array in data { +// convert string to List to reuse list's cast +let int_data = &array.buffers()[1]; +let int_data = Arc::new(ArrayData::new( +DataType::UInt8, +int_data.len(), +None, +None, +0, +vec![int_data.clone()], +vec![], +)) as ArrayDataRef; +let list_data = Arc::new(ArrayData::new( +DataType::List(Box::new(DataType::UInt8)), +array.len(), +None, +array.null_buffer().map(|buf| buf.clone()), +array.offset(), +vec![(&array.buffers()[0]).clone()], +vec![int_data], +)); +builder.append_data(&[list_data])?; +} +Ok(()) +} + impl ArrayBuilder for FixedSizeBinaryBuilder { /// Returns the builder as a non-mutable `Any` reference. fn as_any(&self) -> &Any { self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +for array in data { +// convert string to FixedSizeList to reuse list's append +let int_data = &array.buffers()[0]; +let int_data = Arc::new(ArrayData::new( +DataType::UInt8, +int_data.len(), +None, +None, +0, +vec![int_data.clone()], +vec![], +)) as ArrayDataRef; +let list_data = Arc::new(ArrayData::new( +DataType::FixedSizeList(Box::new(DataType::UInt8), self.builder.list_len), Review comment: do we need to validate list_len for each ArrayData as well? Also I recommend calling value_length() method to get list_len value instead or remove value_length entirely and use list_len directly everywhere. ## File path: rust/arrow/src/array/builder.rs ## @@ -1018,6 +1304,48 @@ impl ArrayBuilder for StructBuilder { self.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +for array in data { +let len = array.len(); +if len == 0 { +continue; +} +let offset = array.offset(); +for (builder, child_data) in self +.field_builders +.iter_mut() +.zip(array.child_data().iter()) +{ +// slice child_data to account for offsets +let child_array = make_array(child_data.clone()); +let sliced = child_array.slice(offset, len); +builder.append_data(&[sliced.data()])?; +} +// append array length +self.len += len; +for i in 0..len { +// account for offset as `ArrayData` does not +self.bitmap_builder.append(array.is_valid(offset + i))?; Review comment: looks like we missed the reserve call before the loop here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache
github-actions[bot] commented on pull request #7366: URL: https://github.com/apache/arrow/pull/7366#issuecomment-640122277 https://issues.apache.org/jira/browse/ARROW-8781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache
kou opened a new pull request #7366: URL: https://github.com/apache/arrow/pull/7366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7126: [C++][Python] Remove boost_regex dependency
nealrichardson commented on pull request #7126: URL: https://github.com/apache/arrow/pull/7126#issuecomment-640121051 FTR Thrift 0.13 (and current master) still requires this boost header: https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TTransportException.h#L23 Can we just vendor it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
houqp commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311 ## File path: rust/arrow/src/array/builder.rs ## @@ -577,6 +632,81 @@ where self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +// determine the latest offset on the builder +let mut cum_offset = if self.offsets_builder.len() == 0 { +0 +} else { +// peek into buffer to get last appended offset +let buffer = self.offsets_builder.buffer.data(); +let len = self.offsets_builder.len(); +let (start, end) = ((len - 1) * 4, len * 4); +let slice = &buffer[start..end]; +i32::from_le_bytes(slice.try_into().unwrap()) +}; +for array in data { +if array.child_data().len() != 1 { Review comment: If we are already doing validation before mutating data (which I think it's the right behavior), it's best to move all validation logic into the initial validation loop. Another thing we can move into the initial loop is stats gathering for optimization purpose. For example, we can calculate aggregated element count in the loop. Then before we enter the mutation loop, call reserve method on various builders to allocate memory in one go. This way, we don't have to trigger multiple memory reallocation in the mutation loop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
houqp commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311 ## File path: rust/arrow/src/array/builder.rs ## @@ -577,6 +632,81 @@ where self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +if !check_array_data_type(&self.data_type(), data) { +return Err(ArrowError::InvalidArgumentError( +"Cannot append data to builder if data types are different".to_string(), +)); +} +// determine the latest offset on the builder +let mut cum_offset = if self.offsets_builder.len() == 0 { +0 +} else { +// peek into buffer to get last appended offset +let buffer = self.offsets_builder.buffer.data(); +let len = self.offsets_builder.len(); +let (start, end) = ((len - 1) * 4, len * 4); +let slice = &buffer[start..end]; +i32::from_le_bytes(slice.try_into().unwrap()) +}; +for array in data { +if array.child_data().len() != 1 { Review comment: If we are already doing validation before mutating data (which I think it's the right behavior), it's best to move all validation logic into the initial validation loop. Another thing we can move into the initial loop is stats gathering for optimization purpose. For example, we can calculate aggregated element count in the loop. Then before we enter the mutation loop, call reserve method on various builders to allocate memory in one go. This way, we don't have to trigger multiple memory reallocation in the mutation loop, especially for bitmap_builder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7126: [C++][Python] Remove boost_regex dependency
wesm closed pull request #7126: URL: https://github.com/apache/arrow/pull/7126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7126: [C++][Python] Remove boost_regex dependency
wesm commented on pull request #7126: URL: https://github.com/apache/arrow/pull/7126#issuecomment-640060481 Alright. We could still remove the boost::regex dependency since Thrift does not require that, but I'll close this for now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release
wesm commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-640059970 @BryanCutler @kszucs what do you think about setting up a Spark 3.x integration test? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6134: [Python][WIP] Refactor python builtins conversion
wesm commented on pull request #6134: URL: https://github.com/apache/arrow/pull/6134#issuecomment-640059884 Closing this, please reopen when you have something you want reviewed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #6134: [Python][WIP] Refactor python builtins conversion
wesm closed pull request #6134: URL: https://github.com/apache/arrow/pull/6134 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7340: ARROW-8979: [C++] Refine bitmap unaligned word access
wesm closed pull request #7340: URL: https://github.com/apache/arrow/pull/7340 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
nevi-me commented on pull request #7365: URL: https://github.com/apache/arrow/pull/7365#issuecomment-640031722 @houqp I've pushed some changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
nevi-me commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436257140 ## File path: rust/arrow/src/array/builder.rs ## @@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder { self.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +for array in data { +if let DataType::Struct(fields) = array.data_type() { +if &self.fields != fields { +return Err(ArrowError::InvalidArgumentError( +"Struct arrays are not the same".to_string(), +)); +} +let len = array.len(); +if len == 0 { +continue; +} +let offset = array.offset(); +let results: Result> = self +.field_builders +.iter_mut() +.zip(array.child_data()) +.map(|(builder, child_data)| { +// slice child_data to account for offsets +let child_array = make_array(child_data.clone()); +let sliced = child_array.slice(offset, len); +builder.append_data(&[sliced.data()]) +}) +.collect(); +results?; Review comment: Thanks, I've changed it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
nevi-me commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436254430 ## File path: rust/arrow/src/array/builder.rs ## @@ -450,6 +455,44 @@ impl ArrayBuilder for PrimitiveBuilder { self.values_builder.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +let mul = T::get_bit_width() / 8; +for array in data { +if array.data_type() != &T::get_data_type() { Review comment: I considered this, I ideally wanted to leave the responsibility to the user, as we'd provide convenience methods like concat, but since this is public, I should check data types first, then return early if there's a mismatch. I'll update This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on a change in pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression
houqp commented on a change in pull request #7324: URL: https://github.com/apache/arrow/pull/7324#discussion_r436252687 ## File path: rust/arrow/src/compute/kernels/concat.rs ## @@ -0,0 +1,395 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Defines concat kernel for `ArrayRef` +//! +//! Example: +//! +//! ``` +//! use std::sync::Arc; +//! use arrow::array::{ArrayRef, StringArray}; +//! use arrow::compute::concat; +//! +//! let arr = concat(&vec![ +//! Arc::new(StringArray::from(vec!["hello", "world"])) as ArrayRef, +//! Arc::new(StringArray::from(vec!["!"])) as ArrayRef, +//! ]).unwrap(); +//! assert_eq!(arr.len(), 3); +//! ``` + +use std::sync::Arc; + +use crate::array::*; +use crate::datatypes::*; +use crate::error::{ArrowError, Result}; + +use TimeUnit::*; + +/// Concatenate multiple `ArrayRef` with the same type. +/// +/// Returns a new ArrayRef. +pub fn concat(array_list: &Vec) -> Result { +let mut data_type: Option = None; +let array_data_list = &array_list +.iter() +.map(|a| { +let array_data = a.data_ref().clone(); +let curr_data_type = array_data.data_type().clone(); +match &data_type { +Some(t) => { +if t != &curr_data_type { +return Err(ArrowError::ComputeError( +"Cannot concat arrays with data type".to_string(), +)); +} +} +None => { +data_type = Some(curr_data_type); +} +} +Ok(array_data) +}) +.collect::>>()?; + +let data_type = match data_type { +None => { +return Err(ArrowError::ComputeError( +"Cannot concat 0 array".to_string(), +)); +} +Some(t) => t, +}; +match data_type { +DataType::Utf8 => concat_raw_string(array_data_list), +DataType::Boolean => concat_primitive::(array_data_list), +DataType::Int8 => concat_raw_primitive::(array_data_list), +DataType::Int16 => concat_raw_primitive::(array_data_list), +DataType::Int32 => concat_raw_primitive::(array_data_list), +DataType::Int64 => concat_raw_primitive::(array_data_list), +DataType::UInt8 => concat_raw_primitive::(array_data_list), +DataType::UInt16 => concat_raw_primitive::(array_data_list), +DataType::UInt32 => concat_raw_primitive::(array_data_list), +DataType::UInt64 => concat_raw_primitive::(array_data_list), +DataType::Float32 => concat_raw_primitive::(array_data_list), +DataType::Float64 => concat_raw_primitive::(array_data_list), +DataType::Date32(_) => concat_raw_primitive::(array_data_list), +DataType::Date64(_) => concat_raw_primitive::(array_data_list), +DataType::Time32(Second) => { +concat_raw_primitive::(array_data_list) +} +DataType::Time32(Millisecond) => { +concat_raw_primitive::(array_data_list) +} +DataType::Time64(Microsecond) => { +concat_raw_primitive::(array_data_list) +} +DataType::Time64(Nanosecond) => { +concat_raw_primitive::(array_data_list) +} +DataType::Timestamp(Second, _) => { +concat_raw_primitive::(array_data_list) +} +DataType::Timestamp(Millisecond, _) => { +concat_raw_primitive::(array_data_list) +} +DataType::Timestamp(Microsecond, _) => { +concat_raw_primitive::(array_data_list) +} +DataType::Timestamp(Nanosecond, _) => { +concat_raw_primitive::(array_data_list) +} +DataType::Interval(IntervalUnit::YearMonth) => { +concat_raw_primitive::(array_data_list) +} +DataType::Interval(IntervalUnit::DayTime) => { +concat_raw_primitive::(array_data_list) +} +DataType::Duration(TimeUnit::Second) => { +concat_raw_primitive::(array_data_list) +} +DataType::Duration(TimeUnit::Millisecond) => { +concat_raw_primitiv
[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders
houqp commented on a change in pull request #7365: URL: https://github.com/apache/arrow/pull/7365#discussion_r436250767 ## File path: rust/arrow/src/array/builder.rs ## @@ -577,6 +620,78 @@ where self } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +// determine the latest offset on the builder +let mut cum_offset = if self.offsets_builder.len() == 0 { +0 +} else { +// peek into buffer to get last appended offset +let buffer = self.offsets_builder.buffer.data(); +let len = self.offsets_builder.len(); +let (start, end) = ((len - 1) * 4, len * 4); +let slice = &buffer[start..end]; +i32::from_le_bytes(slice.try_into().unwrap()) +}; +for array in data { +if let DataType::List(_) = array.data_type() { +if array.child_data().len() != 1 { +return Err(ArrowError::InvalidArgumentError( +"When appending list arrays, data must contain 1 child_data element" +.to_string(), +)); +} +let len = array.len(); +if len == 0 { +continue; +} +let offset = array.offset(); + +// `typed_data` is unsafe, however this call is safe as `ListArray` has i32 offsets +unsafe { Review comment: looks like scope of this unsafe block can be reduced? ## File path: rust/arrow/src/array/builder.rs ## @@ -450,6 +455,44 @@ impl ArrayBuilder for PrimitiveBuilder { self.values_builder.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +let mul = T::get_bit_width() / 8; +for array in data { +if array.data_type() != &T::get_data_type() { Review comment: if one of the array data has the wrong type, the append will be partially applied, is this intentional? ## File path: rust/arrow/src/array/builder.rs ## @@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder { self.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +for array in data { +if let DataType::Struct(fields) = array.data_type() { +if &self.fields != fields { +return Err(ArrowError::InvalidArgumentError( +"Struct arrays are not the same".to_string(), Review comment: nitpick, perhaps `have different fields` would be a better wording here. ## File path: rust/arrow/src/array/builder.rs ## @@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder { self.len } +/// Appends data from other arrays into the builder +/// +/// This is most useful when concatenating arrays of the same type into a builder. +fn append_data(&mut self, data: &[ArrayDataRef]) -> Result<()> { +for array in data { +if let DataType::Struct(fields) = array.data_type() { +if &self.fields != fields { +return Err(ArrowError::InvalidArgumentError( +"Struct arrays are not the same".to_string(), +)); +} +let len = array.len(); +if len == 0 { +continue; +} +let offset = array.offset(); +let results: Result> = self +.field_builders +.iter_mut() +.zip(array.child_data()) +.map(|(builder, child_data)| { +// slice child_data to account for offsets +let child_array = make_array(child_data.clone()); +let sliced = child_array.slice(offset, len); +builder.append_data(&[sliced.data()]) +}) +.collect(); +results?; Review comment: minor, if results is not used, then it's better to use a for loop over zipped iterators above to avoid creating a temporary vector at the end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries ab
[GitHub] [arrow] cyb70289 commented on pull request #7340: ARROW-8979: [C++] Refine bitmap unaligned word access
cyb70289 commented on pull request #7340: URL: https://github.com/apache/arrow/pull/7340#issuecomment-640014574 rebased This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #6578: ARROW-7371: [GLib] Add GLib binding of Dataset
kou closed pull request #6578: URL: https://github.com/apache/arrow/pull/6578 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support
github-actions[bot] commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-640010989 https://issues.apache.org/jira/browse/ARROW-8289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on pull request #6578: ARROW-7371: [GLib] Add GLib binding of Dataset
mrkn commented on pull request #6578: URL: https://github.com/apache/arrow/pull/6578#issuecomment-640010351 @kou thank you very much! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me closed pull request #7359: ARROW-8723: [Rust] Remove SIMD specific benchmark code
nevi-me closed pull request #7359: URL: https://github.com/apache/arrow/pull/7359 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me closed pull request #7360: ARROW-9047: [Rust] Fix a segfault when setting zero bits in a zero-length bitset.
nevi-me closed pull request #7360: URL: https://github.com/apache/arrow/pull/7360 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression
houqp commented on pull request #7324: URL: https://github.com/apache/arrow/pull/7324#issuecomment-640006429 @andygrove sent PR to sqlparser: https://github.com/andygrove/sqlparser-rs/pull/185. I recommend we focus on reviewing and getting https://github.com/apache/arrow/pull/7365 merged first. Then I can refactor this PR to use append_data methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org