alamb commented on code in PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912441506


##########
datafusion/physical-plan/src/joins/cross_join.rs:
##########
@@ -636,18 +661,25 @@ mod tests {
                     distinct_count: Precision::Exact(5),
                     max_value: Precision::Exact(ScalarValue::Int64(Some(21))),
                     min_value: Precision::Exact(ScalarValue::Int64(Some(-4))),
+                    sum_value: Precision::Exact(ScalarValue::Int64(Some(
+                        42 * right_row_count as i64,

Review Comment:
   👍 



##########
datafusion/common/src/stats.rs:
##########
@@ -436,6 +492,8 @@ pub struct ColumnStatistics {
     pub max_value: Precision<ScalarValue>,
     /// Minimum value of column
     pub min_value: Precision<ScalarValue>,
+    /// Sum value of a column
+    pub sum_value: Precision<ScalarValue>,

Review Comment:
   As I think we mentioned in https://github.com/apache/datafusion/pull/13736 
my only real concern with this addition is that it will make `ColumnStatistics` 
even bigger (each ScalarValue is quite large already and `ColumnStatistics` are 
copied a bunch
   
   However, I think the "right" fix for that is to move to using a different 
statistics representation (e.g. `Arc::ColumnStatistics`) so I don't see this as 
a blocker



##########
datafusion/common/src/stats.rs:
##########
@@ -170,24 +170,63 @@ impl Precision<ScalarValue> {
     pub fn add(&self, other: &Precision<ScalarValue>) -> 
Precision<ScalarValue> {
         match (self, other) {
             (Precision::Exact(a), Precision::Exact(b)) => {
-                if let Ok(result) = a.add(b) {
-                    Precision::Exact(result)
-                } else {
-                    Precision::Absent
-                }
+                a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
             }
             (Precision::Inexact(a), Precision::Exact(b))
             | (Precision::Exact(a), Precision::Inexact(b))
-            | (Precision::Inexact(a), Precision::Inexact(b)) => {
-                if let Ok(result) = a.add(b) {
-                    Precision::Inexact(result)
-                } else {
-                    Precision::Absent
-                }
+            | (Precision::Inexact(a), Precision::Inexact(b)) => a
+                .add(b)
+                .map(Precision::Inexact)
+                .unwrap_or(Precision::Absent),
+            (_, _) => Precision::Absent,
+        }
+    }
+
+    /// Calculates the difference of two (possibly inexact) [`ScalarValue`] 
values,
+    /// conservatively propagating exactness information. If one of the input
+    /// values is [`Precision::Absent`], the result is `Absent` too.
+    pub fn sub(&self, other: &Precision<ScalarValue>) -> 
Precision<ScalarValue> {
+        match (self, other) {
+            (Precision::Exact(a), Precision::Exact(b)) => {
+                a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
             }
+            (Precision::Inexact(a), Precision::Exact(b))
+            | (Precision::Exact(a), Precision::Inexact(b))
+            | (Precision::Inexact(a), Precision::Inexact(b)) => a
+                .add(b)
+                .map(Precision::Inexact)
+                .unwrap_or(Precision::Absent),
+            (_, _) => Precision::Absent,
+        }
+    }
+
+    /// Calculates the multiplication of two (possibly inexact) 
[`ScalarValue`] values,
+    /// conservatively propagating exactness information. If one of the input
+    /// values is [`Precision::Absent`], the result is `Absent` too.
+    pub fn multiply(&self, other: &Precision<ScalarValue>) -> 
Precision<ScalarValue> {
+        match (self, other) {
+            (Precision::Exact(a), Precision::Exact(b)) => a
+                .mul_checked(b)
+                .map(Precision::Exact)
+                .unwrap_or(Precision::Absent),
+            (Precision::Inexact(a), Precision::Exact(b))
+            | (Precision::Exact(a), Precision::Inexact(b))
+            | (Precision::Inexact(a), Precision::Inexact(b)) => a
+                .mul_checked(b)
+                .map(Precision::Inexact)
+                .unwrap_or(Precision::Absent),
             (_, _) => Precision::Absent,
         }
     }
+
+    /// Casts the value to the given data type, propagating exactness 
information.
+    pub fn cast_to(&self, data_type: &DataType) -> 
Result<Precision<ScalarValue>> {
+        match self {
+            Precision::Exact(value) => 
value.cast_to(data_type).map(Precision::Exact),
+            Precision::Inexact(value) => 
value.cast_to(data_type).map(Precision::Inexact),
+            Precision::Absent => Ok(Precision::Absent),
+        }
+    }

Review Comment:
   Could you please add some unit test coverage for these new functions? 
   
   I see they are covered in the cross join cardinality calculation tests but I 
think some targeted unit tests would be very helpful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to