This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new f9457de779 Move Datafusion Query Optimizer to library user guide 
(#11563)
f9457de779 is described below

commit f9457de779e213f610fc92dd9165076c7ee770a2
Author: Devesh Rahatekar <[email protected]>
AuthorDate: Mon Jul 22 18:01:52 2024 +0530

    Move Datafusion Query Optimizer to library user guide (#11563)
    
    * Added Datafusion Query Optimizer to user guide
    
    * Updated Query optimizer name, Added to index and replaced the README 
content
    
    * Fix RAT check
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
---
 datafusion/optimizer/README.md                     | 318 +--------------------
 docs/source/index.rst                              |   2 +-
 .../source/library-user-guide/query-optimizer.md   |   0
 3 files changed, 3 insertions(+), 317 deletions(-)

diff --git a/datafusion/optimizer/README.md b/datafusion/optimizer/README.md
index 5aacfaf59c..61bc1cd701 100644
--- a/datafusion/optimizer/README.md
+++ b/datafusion/optimizer/README.md
@@ -17,320 +17,6 @@
   under the License.
 -->
 
-# DataFusion Query Optimizer
+Please see [Query Optimizer] in the Library User Guide
 
-[DataFusion][df] is an extensible query execution framework, written in Rust, 
that uses Apache Arrow as its in-memory
-format.
-
-DataFusion has modular design, allowing individual crates to be re-used in 
other projects.
-
-This crate is a submodule of DataFusion that provides a query optimizer for 
logical plans, and
-contains an extensive set of OptimizerRules that may rewrite the plan and/or 
its expressions so
-they execute more quickly while still computing the same result.
-
-## Running the Optimizer
-
-The following code demonstrates the basic flow of creating the optimizer with 
a default set of optimization rules
-and applying it to a logical plan to produce an optimized logical plan.
-
-```rust
-
-// We need a logical plan as the starting point. There are many ways to build 
a logical plan:
-//
-// The `datafusion-expr` crate provides a LogicalPlanBuilder
-// The `datafusion-sql` crate provides a SQL query planner that can create a 
LogicalPlan from SQL
-// The `datafusion` crate provides a DataFrame API that can create a 
LogicalPlan
-let logical_plan = ...
-
-let mut config = OptimizerContext::default();
-let optimizer = Optimizer::new(&config);
-let optimized_plan = optimizer.optimize(&logical_plan, &config, observe)?;
-
-fn observe(plan: &LogicalPlan, rule: &dyn OptimizerRule) {
-    println!(
-        "After applying rule '{}':\n{}",
-        rule.name(),
-        plan.display_indent()
-    )
-}
-```
-
-## Providing Custom Rules
-
-The optimizer can be created with a custom set of rules.
-
-```rust
-let optimizer = Optimizer::with_rules(vec![
-    Arc::new(MyRule {})
-]);
-```
-
-## Writing Optimization Rules
-
-Please refer to the
-[optimizer_rule.rs](../../datafusion-examples/examples/optimizer_rule.rs)
-example to learn more about the general approach to writing optimizer rules and
-then move onto studying the existing rules.
-
-All rules must implement the `OptimizerRule` trait.
-
-```rust
-/// `OptimizerRule` transforms one ['LogicalPlan'] into another which
-/// computes the same results, but in a potentially more efficient
-/// way. If there are no suitable transformations for the input plan,
-/// the optimizer can simply return it as is.
-pub trait OptimizerRule {
-    /// Rewrite `plan` to an optimized form
-    fn optimize(
-        &self,
-        plan: &LogicalPlan,
-        config: &dyn OptimizerConfig,
-    ) -> Result<LogicalPlan>;
-
-    /// A human readable name for this optimizer rule
-    fn name(&self) -> &str;
-}
-```
-
-### General Guidelines
-
-Rules typical walk the logical plan and walk the expression trees inside 
operators and selectively mutate
-individual operators or expressions.
-
-Sometimes there is an initial pass that visits the plan and builds state that 
is used in a second pass that performs
-the actual optimization. This approach is used in projection push down and 
filter push down.
-
-### Expression Naming
-
-Every expression in DataFusion has a name, which is used as the column name. 
For example, in this example the output
-contains a single column with the name `"COUNT(aggregate_test_100.c9)"`:
-
-```text
-> select count(c9) from aggregate_test_100;
-+------------------------------+
-| COUNT(aggregate_test_100.c9) |
-+------------------------------+
-| 100                          |
-+------------------------------+
-```
-
-These names are used to refer to the columns in both subqueries as well as 
internally from one stage of the LogicalPlan
-to another. For example:
-
-```text
-> select "COUNT(aggregate_test_100.c9)" + 1 from (select count(c9) from 
aggregate_test_100) as sq;
-+--------------------------------------------+
-| sq.COUNT(aggregate_test_100.c9) + Int64(1) |
-+--------------------------------------------+
-| 101                                        |
-+--------------------------------------------+
-```
-
-### Implication
-
-Because DataFusion identifies columns using a string name, it means it is 
critical that the names of expressions are
-not changed by the optimizer when it rewrites expressions. This is typically 
accomplished by renaming a rewritten
-expression by adding an alias.
-
-Here is a simple example of such a rewrite. The expression `1 + 2` can be 
internally simplified to 3 but must still be
-displayed the same as `1 + 2`:
-
-```text
-> select 1 + 2;
-+---------------------+
-| Int64(1) + Int64(2) |
-+---------------------+
-| 3                   |
-+---------------------+
-```
-
-Looking at the `EXPLAIN` output we can see that the optimizer has effectively 
rewritten `1 + 2` into effectively
-`3 as "1 + 2"`:
-
-```text
-> explain select 1 + 2;
-+---------------+-------------------------------------------------+
-| plan_type     | plan                                            |
-+---------------+-------------------------------------------------+
-| logical_plan  | Projection: Int64(3) AS Int64(1) + Int64(2)     |
-|               |   EmptyRelation                                 |
-| physical_plan | ProjectionExec: expr=[3 as Int64(1) + Int64(2)] |
-|               |   PlaceholderRowExec                            |
-|               |                                                 |
-+---------------+-------------------------------------------------+
-```
-
-If the expression name is not preserved, bugs such as 
[#3704](https://github.com/apache/datafusion/issues/3704)
-and [#3555](https://github.com/apache/datafusion/issues/3555) occur where the 
expected columns can not be found.
-
-### Building Expression Names
-
-There are currently two ways to create a name for an expression in the logical 
plan.
-
-```rust
-impl Expr {
-    /// Returns the name of this expression as it should appear in a schema. 
This name
-    /// will not include any CAST expressions.
-    pub fn display_name(&self) -> Result<String> {
-        create_name(self)
-    }
-
-    /// Returns a full and complete string representation of this expression.
-    pub fn canonical_name(&self) -> String {
-        format!("{}", self)
-    }
-}
-```
-
-When comparing expressions to determine if they are equivalent, 
`canonical_name` should be used, and when creating a
-name to be used in a schema, `display_name` should be used.
-
-### Utilities
-
-There are a number of utility methods provided that take care of some common 
tasks.
-
-### ExprVisitor
-
-The `ExprVisitor` and `ExprVisitable` traits provide a mechanism for applying 
a visitor pattern to an expression tree.
-
-Here is an example that demonstrates this.
-
-```rust
-fn extract_subquery_filters(expression: &Expr, extracted: &mut Vec<Expr>) -> 
Result<()> {
-    struct InSubqueryVisitor<'a> {
-        accum: &'a mut Vec<Expr>,
-    }
-
-    impl ExpressionVisitor for InSubqueryVisitor<'_> {
-        fn pre_visit(self, expr: &Expr) -> Result<Recursion<Self>> {
-            if let Expr::InSubquery(_) = expr {
-                self.accum.push(expr.to_owned());
-            }
-            Ok(Recursion::Continue(self))
-        }
-    }
-
-    expression.accept(InSubqueryVisitor { accum: extracted })?;
-    Ok(())
-}
-```
-
-### Rewriting Expressions
-
-The `MyExprRewriter` trait can be implemented to provide a way to rewrite 
expressions. This rule can then be applied
-to an expression by calling `Expr::rewrite` (from the `ExprRewritable` trait).
-
-The `rewrite` method will perform a depth first walk of the expression and its 
children to rewrite an expression,
-consuming `self` producing a new expression.
-
-```rust
-let mut expr_rewriter = MyExprRewriter {};
-let expr = expr.rewrite(&mut expr_rewriter)?;
-```
-
-Here is an example implementation which will rewrite `expr BETWEEN a AND b` as 
`expr >= a AND expr <= b`. Note that the
-implementation does not need to perform any recursion since this is handled by 
the `rewrite` method.
-
-```rust
-struct MyExprRewriter {}
-
-impl ExprRewriter for MyExprRewriter {
-    fn mutate(&mut self, expr: Expr) -> Result<Expr> {
-        match expr {
-            Expr::Between {
-                negated,
-                expr,
-                low,
-                high,
-            } => {
-                let expr: Expr = expr.as_ref().clone();
-                let low: Expr = low.as_ref().clone();
-                let high: Expr = high.as_ref().clone();
-                if negated {
-                    Ok(expr.clone().lt(low).or(expr.clone().gt(high)))
-                } else {
-                    Ok(expr.clone().gt_eq(low).and(expr.clone().lt_eq(high)))
-                }
-            }
-            _ => Ok(expr.clone()),
-        }
-    }
-}
-```
-
-### optimize_children
-
-Typically a rule is applied recursively to all operators within a query plan. 
Rather than duplicate
-that logic in each rule, an `optimize_children` method is provided. This 
recursively invokes the `optimize` method on
-the plan's children and then returns a node of the same type.
-
-```rust
-fn optimize(
-    &self,
-    plan: &LogicalPlan,
-    _config: &mut OptimizerConfig,
-) -> Result<LogicalPlan> {
-    // recurse down and optimize children first
-    let plan = utils::optimize_children(self, plan, _config)?;
-
-    ...
-}
-```
-
-### Writing Tests
-
-There should be unit tests in the same file as the new rule that test the 
effect of the rule being applied to a plan
-in isolation (without any other rule being applied).
-
-There should also be a test in `integration-tests.rs` that tests the rule as 
part of the overall optimization process.
-
-### Debugging
-
-The `EXPLAIN VERBOSE` command can be used to show the effect of each 
optimization rule on a query.
-
-In the following example, the `type_coercion` and `simplify_expressions` 
passes have simplified the plan so that it returns the constant `"3.2"` rather 
than doing a computation at execution time.
-
-```text
-> explain verbose select cast(1 + 2.2 as string) as foo;
-+------------------------------------------------------------+---------------------------------------------------------------------------+
-| plan_type                                                  | plan            
                                                          |
-+------------------------------------------------------------+---------------------------------------------------------------------------+
-| initial_logical_plan                                       | Projection: 
CAST(Int64(1) + Float64(2.2) AS Utf8) AS foo                  |
-|                                                            |   EmptyRelation 
                                                          |
-| logical_plan after type_coercion                           | Projection: 
CAST(CAST(Int64(1) AS Float64) + Float64(2.2) AS Utf8) AS foo |
-|                                                            |   EmptyRelation 
                                                          |
-| logical_plan after simplify_expressions                    | Projection: 
Utf8("3.2") AS foo                                            |
-|                                                            |   EmptyRelation 
                                                          |
-| logical_plan after unwrap_cast_in_comparison               | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after decorrelate_where_exists                | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after decorrelate_where_in                    | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after scalar_subquery_to_join                 | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after subquery_filter_to_join                 | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after simplify_expressions                    | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after eliminate_filter                        | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after reduce_cross_join                       | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after common_sub_expression_eliminate         | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after eliminate_limit                         | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after projection_push_down                    | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after rewrite_disjunctive_predicate           | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after reduce_outer_join                       | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after filter_push_down                        | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after limit_push_down                         | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS 
ABOVE                                                        |
-| logical_plan                                               | Projection: 
Utf8("3.2") AS foo                                            |
-|                                                            |   EmptyRelation 
                                                          |
-| initial_physical_plan                                      | ProjectionExec: 
expr=[3.2 as foo]                                         |
-|                                                            |   
PlaceholderRowExec                                                      |
-|                                                            |                 
                                                          |
-| physical_plan after aggregate_statistics                   | SAME TEXT AS 
ABOVE                                                        |
-| physical_plan after join_selection                         | SAME TEXT AS 
ABOVE                                                        |
-| physical_plan after coalesce_batches                       | SAME TEXT AS 
ABOVE                                                        |
-| physical_plan after repartition                            | SAME TEXT AS 
ABOVE                                                        |
-| physical_plan after add_merge_exec                         | SAME TEXT AS 
ABOVE                                                        |
-| physical_plan                                              | ProjectionExec: 
expr=[3.2 as foo]                                         |
-|                                                            |   
PlaceholderRowExec                                                      |
-|                                                            |                 
                                                          |
-+------------------------------------------------------------+---------------------------------------------------------------------------+
-```
-
-[df]: https://crates.io/crates/datafusion
+[query optimizer]: 
https://datafusion.apache.org/library-user-guide/query-optimizer.html
diff --git a/docs/source/index.rst b/docs/source/index.rst
index ca6905c434..9c8c886d25 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -107,7 +107,7 @@ To get started, see
    library-user-guide/custom-table-providers
    library-user-guide/extending-operators
    library-user-guide/profiling
-
+   library-user-guide/query-optimizer
 .. _toc.contributor-guide:
 
 .. toctree::
diff --git a/datafusion/optimizer/README.md 
b/docs/source/library-user-guide/query-optimizer.md
similarity index 100%
copy from datafusion/optimizer/README.md
copy to docs/source/library-user-guide/query-optimizer.md


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to