gabotechs commented on code in PR #7933:
URL: https://github.com/apache/arrow-rs/pull/7933#discussion_r2222829902


##########
arrow-arith/src/aggregate.rs:
##########
@@ -1701,4 +1771,119 @@ mod tests {
         sum_checked(&a).expect_err("overflow should be detected");
         sum_array_checked::<Int32Type, _>(&a).expect_err("overflow should be 
detected");
     }
+    mod ree_aggregation {
+        use super::*;
+        use arrow_array::{RunArray, Int32Array, Int64Array, Float64Array};
+        use arrow_array::types::{Int32Type, Int64Type, Float64Type};

Review Comment:
   How about adding a test that checks when everything is null? with some 
values that look like this:
   
   ```rust
               let values = Int32Array::from(vec![None, None, None]);
   ```



##########
arrow-array/src/array/mod.rs:
##########
@@ -68,6 +68,9 @@ mod run_array;
 
 pub use run_array::*;
 
+// Re-export the unwrap_ree_array function for public use
+pub use run_array::unwrap_ree_array;

Review Comment:
   Actually, this whole export seems redundant, as it's already being exported 
just to lines above



##########
arrow-arith/src/aggregate.rs:
##########
@@ -1701,4 +1771,119 @@ mod tests {
         sum_checked(&a).expect_err("overflow should be detected");
         sum_array_checked::<Int32Type, _>(&a).expect_err("overflow should be 
detected");
     }
+    mod ree_aggregation {
+        use super::*;
+        use arrow_array::{RunArray, Int32Array, Int64Array, Float64Array};
+        use arrow_array::types::{Int32Type, Int64Type, Float64Type};
+
+        #[test]
+        fn test_ree_sum_array_basic() {
+            // REE array: [10, 10, 20, 30, 30,30] (logical length 6)
+            let run_ends = Int32Array::from(vec![2, 3, 6]);
+            let values = Int32Array::from(vec![10, 20, 30]);
+            let run_array = RunArray::<Int32Type>::try_new(&run_ends, 
&values).unwrap();
+
+            
+            let typed_array = run_array.downcast::<Int32Array>().unwrap();
+
+            let result = sum_array::<Int32Type, _>(typed_array);
+            assert_eq!(result, Some(130)); // 10+10+20+30+30+30 = 130
+        }
+
+        #[test]
+        fn test_ree_sum_array_with_nulls() {
+            // REE array with nulls: [10, NULL, 20, NULL, 30]
+            let run_ends = Int32Array::from(vec![1, 2, 3, 4, 5]);
+            let values = Int32Array::from(vec![10, 0, 20, 0, 30]); // 0 
represents null

Review Comment:
   🤔 0 represents null? I though 0 represents 0. Maybe it would be more 
accurate to do something like: 
   
   ```rust
               let values = Int32Array::from(vec![Some(10), None, Some(20), 
None, Some(30)]);
   ```
   



##########
arrow-arith/src/aggregate.rs:
##########
@@ -17,7 +17,7 @@
 
 //! Defines aggregations over Arrow arrays.
 
-use arrow_array::cast::*;
+use arrow_array::cast::{*};

Review Comment:
   This does not look properly formatted, did you remember to run `cargo fmt`?



##########
arrow-ord/src/cmp.rs:
##########
@@ -224,6 +223,14 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> 
Result<BooleanArray,
     let l_nulls = l.logical_nulls();
     let r_nulls = r.logical_nulls();
 
+    // 1. Handle REE arrays first - extract logical values
+    use arrow_array::unwrap_ree_array;

Review Comment:
   Usually in Rust import statements are declared at the beginning of the file. 
In practice it does not matter much, but it's nice to be consistent with the 
conventions from previous contributions so that reading different chunks of 
code is not an inconsistent experience.



##########
arrow-array/src/array/run_array.rs:
##########
@@ -23,11 +23,7 @@ use arrow_data::{ArrayData, ArrayDataBuilder};
 use arrow_schema::{ArrowError, DataType, Field};
 
 use crate::{
-    builder::StringRunBuilder,
-    make_array,
-    run_iterator::RunArrayIter,
-    types::{Int16Type, Int32Type, Int64Type, RunEndIndexType},
-    Array, ArrayAccessor, ArrayRef, PrimitiveArray,
+    builder::StringRunBuilder, cast::AsArray, make_array, 
run_iterator::RunArrayIter, types::{Int16Type, Int32Type, Int64Type, 
RunEndIndexType}, Array, ArrayAccessor, ArrayRef, ArrowPrimitiveType, 
PrimitiveArray, Int16Array, Int32Array, Int64Array

Review Comment:
   It also looks like this file is not properly formatted. I'd try to format 
the code with `cargo fmt`.



##########
arrow-array/src/array/run_array.rs:
##########
@@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> {
             values: self.values.clone(),
         }
     }
+    /// Expands the REE array to its logical form
+    pub fn expand_to_logical<T: ArrowPrimitiveType>(&self) -> Result<Box<dyn 
Array>, ArrowError> 
+    where
+        T::Native: Default,
+    {
+        let typed_ree = self.downcast::<PrimitiveArray<T>>()
+            .ok_or_else(|| ArrowError::InvalidArgumentError("Failed to 
downcast to typed REE".to_string()))?;
+        
+        let mut builder = PrimitiveArray::<T>::builder(typed_ree.len());
+        for i in 0..typed_ree.len() {
+            if typed_ree.is_null(i) {
+                builder.append_null();
+            } else {
+                builder.append_value(typed_ree.value(i));
+            }
+        }
+        Ok(Box::new(builder.finish()))
+    }
+    /// Unwraps a REE array into a logical array
+    pub fn unwrap_ree_array(array: &dyn Array) -> Option<Box<dyn Array>> {

Review Comment:
   For example, in Rust, usually functions that convert from one type to 
another in a way that does not consume the original type are named something 
like `to_x`, and functions that do the same but consuming the original type are 
named `into_x`.
   
   Maybe this can be called `to_expanded_array` or something similar, as it's 
not consuming the original array.



##########
arrow-array/src/array/run_array.rs:
##########
@@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> {
             values: self.values.clone(),
         }
     }
+    /// Expands the REE array to its logical form
+    pub fn expand_to_logical<T: ArrowPrimitiveType>(&self) -> Result<Box<dyn 
Array>, ArrowError> 
+    where
+        T::Native: Default,
+    {
+        let typed_ree = self.downcast::<PrimitiveArray<T>>()
+            .ok_or_else(|| ArrowError::InvalidArgumentError("Failed to 
downcast to typed REE".to_string()))?;
+        
+        let mut builder = PrimitiveArray::<T>::builder(typed_ree.len());
+        for i in 0..typed_ree.len() {
+            if typed_ree.is_null(i) {
+                builder.append_null();
+            } else {
+                builder.append_value(typed_ree.value(i));
+            }
+        }
+        Ok(Box::new(builder.finish()))
+    }
+    /// Unwraps a REE array into a logical array
+    pub fn unwrap_ree_array(array: &dyn Array) -> Option<Box<dyn Array>> {

Review Comment:
   the word `unwrap` in Rust is usually referred to "unwrapping" a Result or an 
Option in other to get the inner value, panicking if it's not possible.
   
   Maybe we can find another word for this function less subject to be mistaken 
with that behavior.



##########
arrow-array/src/array/mod.rs:
##########
@@ -68,6 +68,9 @@ mod run_array;
 
 pub use run_array::*;
 
+// Re-export the unwrap_ree_array function for public use
+pub use run_array::unwrap_ree_array;

Review Comment:
   This comment seems like it's pretty much just stating how the Rust compiler 
works rather than adding extra information not present in the code itself. If 
you ask me, I would just remove it.



##########
arrow-array/src/array/run_array.rs:
##########
@@ -251,6 +247,47 @@ impl<R: RunEndIndexType> RunArray<R> {
             values: self.values.clone(),
         }
     }
+    /// Expands the REE array to its logical form

Review Comment:
   ```suggestion
       }
       
       /// Expands the REE array to its logical form
   ```
   All the other functions in this file seem to be leaving a space between 
them. I would try to be consistent with the previous formatting of the file



##########
arrow-ord/src/lib.rs:
##########
@@ -49,6 +49,7 @@
 )]
 #![cfg_attr(docsrs, feature(doc_auto_cfg))]
 #![warn(missing_docs)]
+pub use arrow_array::downcast_primitive_array;

Review Comment:
   Do you think this needs to be made public? it looks unrelated to REE. Maybe 
you want it to be public just inside the crate? `pub(crate) use ...`



##########
arrow-ord/src/cmp.rs:
##########
@@ -224,6 +223,14 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> 
Result<BooleanArray,
     let l_nulls = l.logical_nulls();
     let r_nulls = r.logical_nulls();
 
+    // 1. Handle REE arrays first - extract logical values
+    use arrow_array::unwrap_ree_array;
+    let l_ree = unwrap_ree_array(l);
+    let r_ree = unwrap_ree_array(r);

Review Comment:
   I see that for we are always expanding REE arrays for comparing them, but 
wouldn't it be nice to not expand them and compare them in the compacted form 
if both comparison operands are REE?
   
   I imagine that it should not be very difficult to do, and I expect it to be 
very good for performance. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to