alamb commented on code in PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912441506
##########
datafusion/physical-plan/src/joins/cross_join.rs:
##########
@@ -636,18 +661,25 @@ mod tests {
distinct_count: Precision::Exact(5),
max_value: Precision::Exact(ScalarValue::Int64(Some(21))),
min_value: Precision::Exact(ScalarValue::Int64(Some(-4))),
+ sum_value: Precision::Exact(ScalarValue::Int64(Some(
+ 42 * right_row_count as i64,
Review Comment:
👍
##########
datafusion/common/src/stats.rs:
##########
@@ -436,6 +492,8 @@ pub struct ColumnStatistics {
pub max_value: Precision<ScalarValue>,
/// Minimum value of column
pub min_value: Precision<ScalarValue>,
+ /// Sum value of a column
+ pub sum_value: Precision<ScalarValue>,
Review Comment:
As I think we mentioned in https://github.com/apache/datafusion/pull/13736
my only real concern with this addition is that it will make `ColumnStatistics`
even bigger (each ScalarValue is quite large already and `ColumnStatistics` are
copied a bunch
However, I think the "right" fix for that is to move to using a different
statistics representation (e.g. `Arc::ColumnStatistics`) so I don't see this as
a blocker
##########
datafusion/common/src/stats.rs:
##########
@@ -170,24 +170,63 @@ impl Precision<ScalarValue> {
pub fn add(&self, other: &Precision<ScalarValue>) ->
Precision<ScalarValue> {
match (self, other) {
(Precision::Exact(a), Precision::Exact(b)) => {
- if let Ok(result) = a.add(b) {
- Precision::Exact(result)
- } else {
- Precision::Absent
- }
+ a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
}
(Precision::Inexact(a), Precision::Exact(b))
| (Precision::Exact(a), Precision::Inexact(b))
- | (Precision::Inexact(a), Precision::Inexact(b)) => {
- if let Ok(result) = a.add(b) {
- Precision::Inexact(result)
- } else {
- Precision::Absent
- }
+ | (Precision::Inexact(a), Precision::Inexact(b)) => a
+ .add(b)
+ .map(Precision::Inexact)
+ .unwrap_or(Precision::Absent),
+ (_, _) => Precision::Absent,
+ }
+ }
+
+ /// Calculates the difference of two (possibly inexact) [`ScalarValue`]
values,
+ /// conservatively propagating exactness information. If one of the input
+ /// values is [`Precision::Absent`], the result is `Absent` too.
+ pub fn sub(&self, other: &Precision<ScalarValue>) ->
Precision<ScalarValue> {
+ match (self, other) {
+ (Precision::Exact(a), Precision::Exact(b)) => {
+ a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
}
+ (Precision::Inexact(a), Precision::Exact(b))
+ | (Precision::Exact(a), Precision::Inexact(b))
+ | (Precision::Inexact(a), Precision::Inexact(b)) => a
+ .add(b)
+ .map(Precision::Inexact)
+ .unwrap_or(Precision::Absent),
+ (_, _) => Precision::Absent,
+ }
+ }
+
+ /// Calculates the multiplication of two (possibly inexact)
[`ScalarValue`] values,
+ /// conservatively propagating exactness information. If one of the input
+ /// values is [`Precision::Absent`], the result is `Absent` too.
+ pub fn multiply(&self, other: &Precision<ScalarValue>) ->
Precision<ScalarValue> {
+ match (self, other) {
+ (Precision::Exact(a), Precision::Exact(b)) => a
+ .mul_checked(b)
+ .map(Precision::Exact)
+ .unwrap_or(Precision::Absent),
+ (Precision::Inexact(a), Precision::Exact(b))
+ | (Precision::Exact(a), Precision::Inexact(b))
+ | (Precision::Inexact(a), Precision::Inexact(b)) => a
+ .mul_checked(b)
+ .map(Precision::Inexact)
+ .unwrap_or(Precision::Absent),
(_, _) => Precision::Absent,
}
}
+
+ /// Casts the value to the given data type, propagating exactness
information.
+ pub fn cast_to(&self, data_type: &DataType) ->
Result<Precision<ScalarValue>> {
+ match self {
+ Precision::Exact(value) =>
value.cast_to(data_type).map(Precision::Exact),
+ Precision::Inexact(value) =>
value.cast_to(data_type).map(Precision::Inexact),
+ Precision::Absent => Ok(Precision::Absent),
+ }
+ }
Review Comment:
Could you please add some unit test coverage for these new functions?
I see they are covered in the cross join cardinality calculation tests but I
think some targeted unit tests would be very helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]