Re: [PR] feat: extend check_support to include ScalarFunctionExpr in interval analysis [datafusion]

via GitHub Sat, 16 May 2026 06:53:28 -0700


Copilot commented on code in PR #22248:
URL: https://github.com/apache/datafusion/pull/22248#discussion_r3252891282



##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
     }
 
     fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
-        let data_type = inputs[0].data_type();
-        Interval::make_unbounded(&data_type)
+        let [input] = inputs else {
+            let data_type = inputs
+                .first()
+                .map(|i| i.data_type())
+                .unwrap_or(DataType::Float64);
+            return Interval::make_unbounded(&data_type);

Review Comment:
   `evaluate_bounds` silently returns an unbounded interval when called with 
the wrong arity. Since `ceil` is defined as unary, receiving anything other 
than 1 input indicates a programming/planning error; returning unbounded can 
hide the problem and degrade estimates in non-obvious ways. Prefer returning an 
explicit error (e.g., internal error: expected 1 input) for non-1 arity.
   



##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
     }
 
     fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
-        let data_type = inputs[0].data_type();
-        Interval::make_unbounded(&data_type)
+        let [input] = inputs else {
+            let data_type = inputs
+                .first()
+                .map(|i| i.data_type())
+                .unwrap_or(DataType::Float64);
+            return Interval::make_unbounded(&data_type);
+        };
+        let data_type = input.data_type();
+        match (ceil_scalar(input.lower()), ceil_scalar(input.upper())) {
+            (Some(lo), Some(hi)) => Interval::try_new(lo, hi)
+                .or_else(|_| Interval::make_unbounded(&data_type)),
+            _ => Interval::make_unbounded(&data_type),
+        }
+    }
+
+    fn propagate_constraints(
+        &self,
+        interval: &Interval,
+        inputs: &[&Interval],
+    ) -> Result<Option<Vec<Interval>>> {
+        let [input_interval] = inputs else {
+            return Ok(Some(inputs.iter().map(|i| (*i).clone()).collect()));
+        };

Review Comment:
   `propagate_constraints` also silently accepts non-1 arity by returning 
unchanged intervals. For a unary function, this should be treated as an 
internal error to avoid masking bugs and producing silently-wrong or 
unexpectedly-weak constraint propagation.



##########
datafusion/physical-expr/src/intervals/utils.rs:
##########
@@ -56,6 +57,12 @@ pub fn check_support(expr: &Arc<dyn PhysicalExpr>, schema: 
&SchemaRef) -> bool {
         check_support(cast.expr(), schema)
     } else if let Some(negative) = expr.downcast_ref::<NegativeExpr>() {
         check_support(negative.arg(), schema)
+    } else if let Some(scalar_fn) = expr.downcast_ref::<ScalarFunctionExpr>() {
+        is_datatype_supported(scalar_fn.return_type())
+            && scalar_fn
+                .args()
+                .iter()
+                .all(|arg| check_support(arg, schema))

Review Comment:
   `check_support` now permits all `ScalarFunctionExpr` with supported 
return/arg types, but interval analysis is generally only sound for 
deterministic (e.g., Immutable) functions. Allowing volatile/stable UDFs here 
can lead to incorrect interval deductions and therefore misleading 
statistics/constraint propagation. Consider additionally gating this branch on 
the UDF’s volatility (e.g., only `Volatility::Immutable`, or at least excluding 
`Volatility::Volatile`).



##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
     }
 
     fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
-        let data_type = inputs[0].data_type();
-        Interval::make_unbounded(&data_type)
+        let [input] = inputs else {
+            let data_type = inputs
+                .first()
+                .map(|i| i.data_type())
+                .unwrap_or(DataType::Float64);
+            return Interval::make_unbounded(&data_type);
+        };
+        let data_type = input.data_type();
+        match (ceil_scalar(input.lower()), ceil_scalar(input.upper())) {
+            (Some(lo), Some(hi)) => Interval::try_new(lo, hi)
+                .or_else(|_| Interval::make_unbounded(&data_type)),
+            _ => Interval::make_unbounded(&data_type),
+        }
+    }
+
+    fn propagate_constraints(
+        &self,
+        interval: &Interval,
+        inputs: &[&Interval],
+    ) -> Result<Option<Vec<Interval>>> {
+        let [input_interval] = inputs else {
+            return Ok(Some(inputs.iter().map(|i| (*i).clone()).collect()));
+        };
+        // ceil(x) ∈ [N, M] → x ∈ (N−1, M] — conservative closed: [N−1, M]
+        let lo = match interval.lower() {
+            ScalarValue::Float64(Some(n)) if n.is_finite() => {
+                Some(ScalarValue::Float64(Some(n - 1.0)))
+            }
+            ScalarValue::Float32(Some(n)) if n.is_finite() => {
+                Some(ScalarValue::Float32(Some(n - 1.0)))
+            }
+            _ => None,
+        };
+        let hi = match interval.upper() {
+            ScalarValue::Float64(Some(n)) if n.is_finite() => {
+                Some(ScalarValue::Float64(Some(*n)))
+            }
+            ScalarValue::Float32(Some(n)) if n.is_finite() => {
+                Some(ScalarValue::Float32(Some(*n)))

Review Comment:
   The propagated bounds are currently quite loose when the constraint interval 
bounds are non-integers (even though `ceil` results are integer-valued floats). 
For example, `ceil(x) <= 12.3` is equivalent to `ceil(x) <= 12`, implying `x <= 
12`, but the current logic yields `x <= 12.3`. Consider tightening by first 
normalizing result bounds to integers (e.g., lower bound -> `ceil(lower)`, 
upper bound -> `floor(upper)`) before mapping to the input domain; this 
improves selectivity estimates without sacrificing soundness.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: extend check_support to include ScalarFunctionExpr in interval analysis [datafusion]

Reply via email to