This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new fdd36d0d21 Update comments on OptimizerRule about function name
matching (#20346)
fdd36d0d21 is described below
commit fdd36d0d2198d1701fffdb02901b20e0611b5468
Author: Andrew Lamb <[email protected]>
AuthorDate: Tue Feb 24 14:46:32 2026 -0500
Update comments on OptimizerRule about function name matching (#20346)
## Which issue does this PR close?
- Related to https://github.com/apache/datafusion/pull/20180
## Rationale for this change
I gave feedback to @devanshu0987
https://github.com/apache/datafusion/pull/20180/changes#r2800720037 that
it was not a good idea to check for function names in optimizer rules,
but then I realized that the rationale for this is not written down
anywhere.
## What changes are included in this PR?
Document why checking for function names in optimizer rules is not good
and offer alternatives
## Are these changes tested?
By CI
## Are there any user-facing changes?
Just docs, no functional changes
---
datafusion/optimizer/src/optimizer.rs | 46 +++++++++++++++++++++++++++++------
1 file changed, 39 insertions(+), 7 deletions(-)
diff --git a/datafusion/optimizer/src/optimizer.rs
b/datafusion/optimizer/src/optimizer.rs
index 0da4d6352a..bdea6a8307 100644
--- a/datafusion/optimizer/src/optimizer.rs
+++ b/datafusion/optimizer/src/optimizer.rs
@@ -58,12 +58,12 @@ use crate::simplify_expressions::SimplifyExpressions;
use crate::single_distinct_to_groupby::SingleDistinctToGroupBy;
use crate::utils::log_plan;
-/// `OptimizerRule`s transforms one [`LogicalPlan`] into another which
-/// computes the same results, but in a potentially more efficient
-/// way. If there are no suitable transformations for the input plan,
-/// the optimizer should simply return it unmodified.
+/// Transforms one [`LogicalPlan`] into another which computes the same
results,
+/// but in a potentially more efficient way.
///
-/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`]
+/// See notes on [`Self::rewrite`] for details on how to implement an
`OptimizerRule`.
+///
+/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`].
///
/// Use [`SessionState::add_optimizer_rule`] to register additional
/// `OptimizerRule`s.
@@ -88,8 +88,40 @@ pub trait OptimizerRule: Debug {
true
}
- /// Try to rewrite `plan` to an optimized form, returning
`Transformed::yes`
- /// if the plan was rewritten and `Transformed::no` if it was not.
+ /// Try to rewrite `plan` to an optimized form, returning
[`Transformed::yes`]
+ /// if the plan was rewritten and [`Transformed::no`] if it was not.
+ ///
+ /// # Notes for implementations:
+ ///
+ /// ## Return the same plan if no changes were made
+ ///
+ /// If there are no suitable transformations for the input plan,
+ /// the optimizer should simply return it unmodified.
+ ///
+ /// The optimizer will call `rewrite` several times until a fixed point is
+ /// reached, so it is important that `rewrite` return [`Transformed::no`]
if
+ /// the output is the same.
+ ///
+ /// ## Matching on functions
+ ///
+ /// The rule should avoid function-specific transformations, and instead
use
+ /// methods on [`ScalarUDFImpl`] and [`AggregateUDFImpl`]. Specifically,
the
+ /// rule should not check function names as functions can be overridden,
and
+ /// may not have the same semantics as the functions provided with
+ /// DataFusion.
+ ///
+ /// For example, if a rule rewrites a function based on the check
+ /// `func.name() == "sum"`, it may rewrite the plan incorrectly if the
+ /// registered `sum` function has different semantics (for example, the
+ /// `sum` function from the `datafusion-spark` crate).
+ ///
+ /// There are still several cases that rely on function name checking in
+ /// the rules included with DataFusion. Please see [#18643] for more
details
+ /// and to help remove these cases.
+ ///
+ /// [`ScalarUDFImpl`]: datafusion_expr::ScalarUDFImpl
+ /// [`AggregateUDFImpl`]: datafusion_expr::ScalarUDFImpl
+ /// [#18643]: https://github.com/apache/datafusion/issues/18643
fn rewrite(
&self,
_plan: LogicalPlan,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]