Copilot commented on code in PR #22248:
URL: https://github.com/apache/datafusion/pull/22248#discussion_r3252891282
##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
}
fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
- let data_type = inputs[0].data_type();
- Interval::make_unbounded(&data_type)
+ let [input] = inputs else {
+ let data_type = inputs
+ .first()
+ .map(|i| i.data_type())
+ .unwrap_or(DataType::Float64);
+ return Interval::make_unbounded(&data_type);
Review Comment:
`evaluate_bounds` silently returns an unbounded interval when called with
the wrong arity. Since `ceil` is defined as unary, receiving anything other
than 1 input indicates a programming/planning error; returning unbounded can
hide the problem and degrade estimates in non-obvious ways. Prefer returning an
explicit error (e.g., internal error: expected 1 input) for non-1 arity.
##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
}
fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
- let data_type = inputs[0].data_type();
- Interval::make_unbounded(&data_type)
+ let [input] = inputs else {
+ let data_type = inputs
+ .first()
+ .map(|i| i.data_type())
+ .unwrap_or(DataType::Float64);
+ return Interval::make_unbounded(&data_type);
+ };
+ let data_type = input.data_type();
+ match (ceil_scalar(input.lower()), ceil_scalar(input.upper())) {
+ (Some(lo), Some(hi)) => Interval::try_new(lo, hi)
+ .or_else(|_| Interval::make_unbounded(&data_type)),
+ _ => Interval::make_unbounded(&data_type),
+ }
+ }
+
+ fn propagate_constraints(
+ &self,
+ interval: &Interval,
+ inputs: &[&Interval],
+ ) -> Result<Option<Vec<Interval>>> {
+ let [input_interval] = inputs else {
+ return Ok(Some(inputs.iter().map(|i| (*i).clone()).collect()));
+ };
Review Comment:
`propagate_constraints` also silently accepts non-1 arity by returning
unchanged intervals. For a unary function, this should be treated as an
internal error to avoid masking bugs and producing silently-wrong or
unexpectedly-weak constraint propagation.
##########
datafusion/physical-expr/src/intervals/utils.rs:
##########
@@ -56,6 +57,12 @@ pub fn check_support(expr: &Arc<dyn PhysicalExpr>, schema:
&SchemaRef) -> bool {
check_support(cast.expr(), schema)
} else if let Some(negative) = expr.downcast_ref::<NegativeExpr>() {
check_support(negative.arg(), schema)
+ } else if let Some(scalar_fn) = expr.downcast_ref::<ScalarFunctionExpr>() {
+ is_datatype_supported(scalar_fn.return_type())
+ && scalar_fn
+ .args()
+ .iter()
+ .all(|arg| check_support(arg, schema))
Review Comment:
`check_support` now permits all `ScalarFunctionExpr` with supported
return/arg types, but interval analysis is generally only sound for
deterministic (e.g., Immutable) functions. Allowing volatile/stable UDFs here
can lead to incorrect interval deductions and therefore misleading
statistics/constraint propagation. Consider additionally gating this branch on
the UDF’s volatility (e.g., only `Volatility::Immutable`, or at least excluding
`Volatility::Volatile`).
##########
datafusion/functions/src/math/ceil.rs:
##########
@@ -191,11 +191,70 @@ impl ScalarUDFImpl for CeilFunc {
}
fn evaluate_bounds(&self, inputs: &[&Interval]) -> Result<Interval> {
- let data_type = inputs[0].data_type();
- Interval::make_unbounded(&data_type)
+ let [input] = inputs else {
+ let data_type = inputs
+ .first()
+ .map(|i| i.data_type())
+ .unwrap_or(DataType::Float64);
+ return Interval::make_unbounded(&data_type);
+ };
+ let data_type = input.data_type();
+ match (ceil_scalar(input.lower()), ceil_scalar(input.upper())) {
+ (Some(lo), Some(hi)) => Interval::try_new(lo, hi)
+ .or_else(|_| Interval::make_unbounded(&data_type)),
+ _ => Interval::make_unbounded(&data_type),
+ }
+ }
+
+ fn propagate_constraints(
+ &self,
+ interval: &Interval,
+ inputs: &[&Interval],
+ ) -> Result<Option<Vec<Interval>>> {
+ let [input_interval] = inputs else {
+ return Ok(Some(inputs.iter().map(|i| (*i).clone()).collect()));
+ };
+ // ceil(x) ∈ [N, M] → x ∈ (N−1, M] — conservative closed: [N−1, M]
+ let lo = match interval.lower() {
+ ScalarValue::Float64(Some(n)) if n.is_finite() => {
+ Some(ScalarValue::Float64(Some(n - 1.0)))
+ }
+ ScalarValue::Float32(Some(n)) if n.is_finite() => {
+ Some(ScalarValue::Float32(Some(n - 1.0)))
+ }
+ _ => None,
+ };
+ let hi = match interval.upper() {
+ ScalarValue::Float64(Some(n)) if n.is_finite() => {
+ Some(ScalarValue::Float64(Some(*n)))
+ }
+ ScalarValue::Float32(Some(n)) if n.is_finite() => {
+ Some(ScalarValue::Float32(Some(*n)))
Review Comment:
The propagated bounds are currently quite loose when the constraint interval
bounds are non-integers (even though `ceil` results are integer-valued floats).
For example, `ceil(x) <= 12.3` is equivalent to `ceil(x) <= 12`, implying `x <=
12`, but the current logic yields `x <= 12.3`. Consider tightening by first
normalizing result bounds to integers (e.g., lower bound -> `ceil(lower)`,
upper bound -> `floor(upper)`) before mapping to the input domain; this
improves selectivity estimates without sacrificing soundness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]