alamb commented on code in PR #20008:
URL: https://github.com/apache/datafusion/pull/20008#discussion_r2733915208
##########
datafusion/expr/src/udf.rs:
##########
@@ -709,20 +709,49 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send +
Sync {
Ok(ExprSimplifyResult::Original(args))
}
- /// Returns the [preimage] for this function and the specified scalar
value, if any.
+ /// Returns a single contiguous preimage for this function and the
specified
+ /// scalar expression, if any.
///
- /// A preimage is a single contiguous [`Interval`] of values where the
function
- /// will always return `lit_value`
+ /// # Return Value
///
- /// Implementations should return intervals with an inclusive lower bound
and
- /// exclusive upper bound.
+ /// Implementations should return a half-open interval: inclusive lower
+ /// bound and exclusive upper bound. Note that this is slightly different
+ /// from normal [`Interval`] semantics where the upper bound is closed. The
+ /// upper endpoint should be adjusted to the next value.
///
- /// This rewrite is described in the [ClickHouse Paper] and is particularly
- /// useful for simplifying expressions `date_part` or equivalent
functions. The
- /// idea is that if you have an expression like `date_part(YEAR, k) =
2024` and you
- /// can find a [preimage] for `date_part(YEAR, k)`, which is the range of
dates
- /// covering the entire year of 2024. Thus, you can rewrite the expression
to `k
- /// >= '2024-01-01' AND k < '2025-01-01' which is often more optimizable.
+ /// # Background
+ ///
+ /// A [preimage] here is a single contiguous [`Interval`] of the function's
+ /// argument(s) where the function will return a single literal (constant)
+ /// value. This can also be thought of as a form of interval containment.
+ ///
+ /// Using a preimage to rewrite predicates is described in the [ClickHouse
+ /// Paper]:
+ ///
+ /// > some functions can compute the preimage of a given function result.
+ /// > This is used to replace comparisons of constants with function calls
+ /// > on the key columns by comparing the key column value with the
preimage.
+ /// > For example, `toYear(k) = 2024` can be replaced by
+ /// > `k >= 2024-01-01 && k < 2025-01-01`
+ ///
+ /// As mentioned above, the preimage can be used to simplify certain types
of
+ /// expressions such as `date_part` into a form that is more optimizable.
+ ///
+ /// For example, given an expression like
+ /// ```sql
+ /// date_part(YEAR, k) = 2024
+ /// ```
+ ///
+ /// There is a single preimage [`2024-01-01`, `2025-01-01`), which is the
+ /// range of dates covering the entire year of 2024 for which
+ /// `date_part(YEAR, k)` evaluates to `2024`. Using this preimage the
+ /// expression can be rewritten to
Review Comment:
As I understand it, the rewrite as currently implemented makes the
assumption that the interval must be contiguous, even though in theory this
same approach could be applied to non-contiguous ranges
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]