gabotechs commented on code in PR #7933: URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2222829902
########## arrow-arith/src/aggregate.rs: ########## @@ -1701,4 +1771,119 @@ mod tests { sum_checked(&a).expect_err("overflow should be detected"); sum_array_checked::<Int32Type, _>(&a).expect_err("overflow should be detected"); } + mod ree_aggregation { + use super::*; + use arrow_array::{RunArray, Int32Array, Int64Array, Float64Array}; + use arrow_array::types::{Int32Type, Int64Type, Float64Type}; Review Comment: How about adding a test that checks when everything is null? with some values that look like this: ```rust let values = Int32Array::from(vec![None, None, None]); ``` ########## arrow-array/src/array/mod.rs: ########## @@ -68,6 +68,9 @@ mod run_array; pub use run_array::*; +// Re-export the unwrap_ree_array function for public use +pub use run_array::unwrap_ree_array; Review Comment: Actually, this whole export seems redundant, as it's already being exported just to lines above ########## arrow-arith/src/aggregate.rs: ########## @@ -1701,4 +1771,119 @@ mod tests { sum_checked(&a).expect_err("overflow should be detected"); sum_array_checked::<Int32Type, _>(&a).expect_err("overflow should be detected"); } + mod ree_aggregation { + use super::*; + use arrow_array::{RunArray, Int32Array, Int64Array, Float64Array}; + use arrow_array::types::{Int32Type, Int64Type, Float64Type}; + + #[test] + fn test_ree_sum_array_basic() { + // REE array: [10, 10, 20, 30, 30,30] (logical length 6) + let run_ends = Int32Array::from(vec![2, 3, 6]); + let values = Int32Array::from(vec![10, 20, 30]); + let run_array = RunArray::<Int32Type>::try_new(&run_ends, &values).unwrap(); + + + let typed_array = run_array.downcast::<Int32Array>().unwrap(); + + let result = sum_array::<Int32Type, _>(typed_array); + assert_eq!(result, Some(130)); // 10+10+20+30+30+30 = 130 + } + + #[test] + fn test_ree_sum_array_with_nulls() { + // REE array with nulls: [10, NULL, 20, NULL, 30] + let run_ends = Int32Array::from(vec![1, 2, 3, 4, 5]); + let values = Int32Array::from(vec![10, 0, 20, 0, 30]); // 0 represents null Review Comment: 🤔 0 represents null? I though 0 represents 0. Maybe it would be more accurate to do something like: ```rust let values = Int32Array::from(vec![Some(10), None, Some(20), None, Some(30)]); ``` ########## arrow-arith/src/aggregate.rs: ########## @@ -17,7 +17,7 @@ //! Defines aggregations over Arrow arrays. -use arrow_array::cast::*; +use arrow_array::cast::{*}; Review Comment: This does not look properly formatted, did you remember to run `cargo fmt`? ########## arrow-ord/src/cmp.rs: ########## @@ -224,6 +223,14 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> Result<BooleanArray, let l_nulls = l.logical_nulls(); let r_nulls = r.logical_nulls(); + // 1. Handle REE arrays first - extract logical values + use arrow_array::unwrap_ree_array; Review Comment: Usually in Rust import statements are declared at the beginning of the file. In practice it does not matter much, but it's nice to be consistent with the conventions from previous contributions so that reading different chunks of code is not an inconsistent experience. ########## arrow-array/src/array/run_array.rs: ########## @@ -23,11 +23,7 @@ use arrow_data::{ArrayData, ArrayDataBuilder}; use arrow_schema::{ArrowError, DataType, Field}; use crate::{ - builder::StringRunBuilder, - make_array, - run_iterator::RunArrayIter, - types::{Int16Type, Int32Type, Int64Type, RunEndIndexType}, - Array, ArrayAccessor, ArrayRef, PrimitiveArray, + builder::StringRunBuilder, cast::AsArray, make_array, run_iterator::RunArrayIter, types::{Int16Type, Int32Type, Int64Type, RunEndIndexType}, Array, ArrayAccessor, ArrayRef, ArrowPrimitiveType, PrimitiveArray, Int16Array, Int32Array, Int64Array Review Comment: It also looks like this file is not properly formatted. I'd try to format the code with `cargo fmt`. ########## arrow-array/src/array/run_array.rs: ########## @@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> { values: self.values.clone(), } } + /// Expands the REE array to its logical form + pub fn expand_to_logical<T: ArrowPrimitiveType>(&self) -> Result<Box<dyn Array>, ArrowError> + where + T::Native: Default, + { + let typed_ree = self.downcast::<PrimitiveArray<T>>() + .ok_or_else(|| ArrowError::InvalidArgumentError("Failed to downcast to typed REE".to_string()))?; + + let mut builder = PrimitiveArray::<T>::builder(typed_ree.len()); + for i in 0..typed_ree.len() { + if typed_ree.is_null(i) { + builder.append_null(); + } else { + builder.append_value(typed_ree.value(i)); + } + } + Ok(Box::new(builder.finish())) + } + /// Unwraps a REE array into a logical array + pub fn unwrap_ree_array(array: &dyn Array) -> Option<Box<dyn Array>> { Review Comment: For example, in Rust, usually functions that convert from one type to another in a way that does not consume the original type are named something like `to_x`, and functions that do the same but consuming the original type are named `into_x`. Maybe this can be called `to_expanded_array` or something similar, as it's not consuming the original array. ########## arrow-array/src/array/run_array.rs: ########## @@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> { values: self.values.clone(), } } + /// Expands the REE array to its logical form + pub fn expand_to_logical<T: ArrowPrimitiveType>(&self) -> Result<Box<dyn Array>, ArrowError> + where + T::Native: Default, + { + let typed_ree = self.downcast::<PrimitiveArray<T>>() + .ok_or_else(|| ArrowError::InvalidArgumentError("Failed to downcast to typed REE".to_string()))?; + + let mut builder = PrimitiveArray::<T>::builder(typed_ree.len()); + for i in 0..typed_ree.len() { + if typed_ree.is_null(i) { + builder.append_null(); + } else { + builder.append_value(typed_ree.value(i)); + } + } + Ok(Box::new(builder.finish())) + } + /// Unwraps a REE array into a logical array + pub fn unwrap_ree_array(array: &dyn Array) -> Option<Box<dyn Array>> { Review Comment: the word `unwrap` in Rust is usually referred to "unwrapping" a Result or an Option in other to get the inner value, panicking if it's not possible. Maybe we can find another word for this function less subject to be mistaken with that behavior. ########## arrow-array/src/array/mod.rs: ########## @@ -68,6 +68,9 @@ mod run_array; pub use run_array::*; +// Re-export the unwrap_ree_array function for public use +pub use run_array::unwrap_ree_array; Review Comment: This comment seems like it's pretty much just stating how the Rust compiler works rather than adding extra information not present in the code itself. If you ask me, I would just remove it. ########## arrow-array/src/array/run_array.rs: ########## @@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> { values: self.values.clone(), } } + /// Expands the REE array to its logical form Review Comment: ```suggestion } /// Expands the REE array to its logical form ``` All the other functions in this file seem to be leaving a space between them. I would try to be consistent with the previous formatting of the file ########## arrow-ord/src/lib.rs: ########## @@ -49,6 +49,7 @@ )] #![cfg_attr(docsrs, feature(doc_auto_cfg))] #![warn(missing_docs)] +pub use arrow_array::downcast_primitive_array; Review Comment: Do you think this needs to be made public? it looks unrelated to REE. Maybe you want it to be public just inside the crate? `pub(crate) use ...` ########## arrow-ord/src/cmp.rs: ########## @@ -224,6 +223,14 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> Result<BooleanArray, let l_nulls = l.logical_nulls(); let r_nulls = r.logical_nulls(); + // 1. Handle REE arrays first - extract logical values + use arrow_array::unwrap_ree_array; + let l_ree = unwrap_ree_array(l); + let r_ree = unwrap_ree_array(r); Review Comment: I see that for we are always expanding REE arrays for comparing them, but wouldn't it be nice to not expand them and compare them in the compacted form if both comparison operands are REE? I imagine that it should not be very difficult to do, and I expect it to be very good for performance. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org