alamb commented on code in PR #19265:
URL: https://github.com/apache/datafusion/pull/19265#discussion_r2616949975


##########
docs/source/library-user-guide/extending-sql.md:
##########
@@ -0,0 +1,339 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Extending DataFusion's SQL Syntax
+
+DataFusion provides a flexible extension system that allows you to customize 
SQL
+parsing and planning without modifying the core codebase. This is useful when 
you
+need to:
+
+- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` 
for JSON)
+- Add custom data types not natively supported
+- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or 
`MATCH_RECOGNIZE`
+
+## Architecture Overview
+
+When DataFusion processes a SQL query, it goes through these stages:
+
+```text
+┌─────────────┐    ┌─────────┐    ┌───────────────────────────────┐    
┌─────────────┐
+│ SQL String  │───▶│ Parser  │───▶│ SqlToRel (SQL to LogicalPlan) │───▶│ 
LogicalPlan │
+└─────────────┘    └─────────┘    └───────────────────────────────┘    
└─────────────┘
+                                              │
+                                              │ uses
+                                              ▼
+                                  ┌───────────────────────┐
+                                  │  Extension Planners   │
+                                  │  • ExprPlanner        │
+                                  │  • TypePlanner        │
+                                  │  • RelationPlanner    │
+                                  └───────────────────────┘
+```
+
+The extension planners intercept specific parts of the SQL AST during the
+`SqlToRel` phase and allow you to customize how they are converted to 
DataFusion's
+logical plan.
+
+## Extension Points
+
+DataFusion provides three planner traits for extending SQL:
+
+| Trait             | Purpose                                 | Registration 
Method                        |
+| ----------------- | --------------------------------------- | 
------------------------------------------ |
+| `ExprPlanner`     | Custom expressions and operators        | 
`ctx.register_expr_planner()`              |
+| `TypePlanner`     | Custom SQL data types                   | 
`SessionStateBuilder::with_type_planner()` |
+| `RelationPlanner` | Custom FROM clause elements (relations) | 
`ctx.register_relation_planner()`          |
+
+**Planner Precedence**: Multiple `ExprPlanner`s and `RelationPlanner`s can be
+registered; they are invoked in reverse registration order (last registered 
wins).
+Return `Original(...)` to delegate to the next planner. Only one `TypePlanner`
+can be active at a time.
+
+### ExprPlanner: Custom Expressions and Operators
+
+Use [`ExprPlanner`] to customize how SQL expressions are converted to 
DataFusion
+logical expressions. This is useful for:
+
+- Custom binary operators (e.g., `->`, `->>`, `@>`, `?`)
+- Custom field access patterns
+- Custom aggregate or window function handling
+
+#### Available Methods
+
+| Category           | Methods                                                 
                           |
+| ------------------ | 
----------------------------------------------------------------------------------
 |
+| Operators          | `plan_binary_op`, `plan_any`                            
                           |
+| Literals           | `plan_array_literal`, `plan_dictionary_literal`, 
`plan_struct_literal`             |
+| Functions          | `plan_extract`, `plan_substring`, `plan_overlay`, 
`plan_position`, `plan_make_map` |
+| Identifiers        | `plan_field_access`, `plan_compound_identifier`         
                           |
+| Aggregates/Windows | `plan_aggregate`, `plan_window`                         
                           |
+
+See the [ExprPlanner API documentation] for full method signatures.
+
+#### Example: Custom Arrow Operator
+
+This example maps the `->` operator to string concatenation:
+
+```rust
+use std::sync::Arc;

Review Comment:
   I suggest adding `#` in front of some of these `use` lines to hide them from 
the documentation so it is easier to focus on the important methods. The same 
comment applies to the other sections below
   
   Like 
   ```
   # use std::sync::Arc;
   ```



##########
docs/source/library-user-guide/extending-sql.md:
##########
@@ -0,0 +1,339 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Extending DataFusion's SQL Syntax
+
+DataFusion provides a flexible extension system that allows you to customize 
SQL
+parsing and planning without modifying the core codebase. This is useful when 
you
+need to:
+
+- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` 
for JSON)
+- Add custom data types not natively supported
+- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or 
`MATCH_RECOGNIZE`
+
+## Architecture Overview
+
+When DataFusion processes a SQL query, it goes through these stages:
+
+```text
+┌─────────────┐    ┌─────────┐    ┌───────────────────────────────┐    
┌─────────────┐
+│ SQL String  │───▶│ Parser  │───▶│ SqlToRel (SQL to LogicalPlan) │───▶│ 
LogicalPlan │
+└─────────────┘    └─────────┘    └───────────────────────────────┘    
└─────────────┘
+                                              │
+                                              │ uses
+                                              ▼
+                                  ┌───────────────────────┐
+                                  │  Extension Planners   │
+                                  │  • ExprPlanner        │
+                                  │  • TypePlanner        │
+                                  │  • RelationPlanner    │
+                                  └───────────────────────┘
+```
+
+The extension planners intercept specific parts of the SQL AST during the
+`SqlToRel` phase and allow you to customize how they are converted to 
DataFusion's
+logical plan.
+
+## Extension Points
+
+DataFusion provides three planner traits for extending SQL:
+
+| Trait             | Purpose                                 | Registration 
Method                        |
+| ----------------- | --------------------------------------- | 
------------------------------------------ |
+| `ExprPlanner`     | Custom expressions and operators        | 
`ctx.register_expr_planner()`              |

Review Comment:
   A nit is that it would be nice to make these links too 
   
   ```
   [`ExprPlanner`]
   ```
   
   (I think it should "just work" given you made the link already below)



##########
docs/source/library-user-guide/extending-sql.md:
##########
@@ -0,0 +1,339 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Extending DataFusion's SQL Syntax

Review Comment:
   A minor comment is I think we could avoid the "DataFusion" in this title as 
I think it is implied given the context
   
   <img width="363" height="568" alt="Image" 
src="https://github.com/user-attachments/assets/e9ae275d-3845-4187-b393-4447379d052e";
 />
   
   However, as I wrote this comment, I see there are some other page titles 
that are similarly overly verbose that could be improved. I'll go make a PR to 
propose cleaning them up.



##########
docs/source/library-user-guide/functions/adding-udfs.md:
##########
@@ -1492,7 +1492,9 @@ async fn main() -> Result<()> {
 
 ## Custom Expression Planning
 
-DataFusion provides native support for common SQL operators by default such as 
`+`, `-`, `||`. However it does not provide support for other operators such as 
`@>`. To override DataFusion's default handling or support unsupported 
operators, developers can extend DataFusion by implementing custom expression 
planning, a core feature of DataFusion
+DataFusion provides native support for common SQL operators by default such as 
`+`, `-`, `||`. However it does not provide support for other operators such as 
`@>`. To override DataFusion's default handling or support unsupported 
operators, developers can extend DataFusion by implementing custom expression 
planning, a core feature of DataFusion.

Review Comment:
   Some rationale (advertising?) of might also help with context here. Maybe it 
would also help to mention sql constructs other than operators. 
   
   Something like this perhaps
   
   ```suggestion
   DataFusion provides native support for common SQL operators and constructs 
by default such as `+`, `-`, `||`. However it does not provide support for 
other operators such as `@>` or `TABLESAMPLE` which are less common or vary 
more between SQL dialects. To override DataFusion's default handling or support 
unsupported operators, developers can extend DataFusion by implementing custom 
expression planning, a core feature of DataFusion.
   ```



##########
docs/source/library-user-guide/extending-sql.md:
##########
@@ -0,0 +1,339 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Extending DataFusion's SQL Syntax
+
+DataFusion provides a flexible extension system that allows you to customize 
SQL
+parsing and planning without modifying the core codebase. This is useful when 
you
+need to:
+
+- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` 
for JSON)
+- Add custom data types not natively supported
+- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or 
`MATCH_RECOGNIZE`
+
+## Architecture Overview
+
+When DataFusion processes a SQL query, it goes through these stages:
+
+```text

Review Comment:
   This is a great flow chart ❤️ 
   
   One thing I noticed it is is slightly clipped :
   <img width="819" height="576" alt="Image" 
src="https://github.com/user-attachments/assets/31fcf2d0-f52f-49f6-93f4-d2a7b79c8c1c";
 />
   
   If we were able to tighten up the SqlToRel box (maybe make it two lines) it 
might fit



##########
docs/source/library-user-guide/extending-sql.md:
##########
@@ -0,0 +1,339 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Extending DataFusion's SQL Syntax
+
+DataFusion provides a flexible extension system that allows you to customize 
SQL
+parsing and planning without modifying the core codebase. This is useful when 
you
+need to:
+
+- Support custom operators from other SQL dialects (e.g., PostgreSQL's `->` 
for JSON)
+- Add custom data types not natively supported
+- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or 
`MATCH_RECOGNIZE`
+
+## Architecture Overview
+
+When DataFusion processes a SQL query, it goes through these stages:
+
+```text
+┌─────────────┐    ┌─────────┐    ┌───────────────────────────────┐    
┌─────────────┐
+│ SQL String  │───▶│ Parser  │───▶│ SqlToRel (SQL to LogicalPlan) │───▶│ 
LogicalPlan │
+└─────────────┘    └─────────┘    └───────────────────────────────┘    
└─────────────┘
+                                              │
+                                              │ uses
+                                              ▼
+                                  ┌───────────────────────┐
+                                  │  Extension Planners   │
+                                  │  • ExprPlanner        │
+                                  │  • TypePlanner        │
+                                  │  • RelationPlanner    │
+                                  └───────────────────────┘
+```
+
+The extension planners intercept specific parts of the SQL AST during the
+`SqlToRel` phase and allow you to customize how they are converted to 
DataFusion's
+logical plan.
+
+## Extension Points
+
+DataFusion provides three planner traits for extending SQL:
+
+| Trait             | Purpose                                 | Registration 
Method                        |
+| ----------------- | --------------------------------------- | 
------------------------------------------ |
+| `ExprPlanner`     | Custom expressions and operators        | 
`ctx.register_expr_planner()`              |
+| `TypePlanner`     | Custom SQL data types                   | 
`SessionStateBuilder::with_type_planner()` |
+| `RelationPlanner` | Custom FROM clause elements (relations) | 
`ctx.register_relation_planner()`          |
+
+**Planner Precedence**: Multiple `ExprPlanner`s and `RelationPlanner`s can be
+registered; they are invoked in reverse registration order (last registered 
wins).
+Return `Original(...)` to delegate to the next planner. Only one `TypePlanner`
+can be active at a time.
+
+### ExprPlanner: Custom Expressions and Operators
+
+Use [`ExprPlanner`] to customize how SQL expressions are converted to 
DataFusion
+logical expressions. This is useful for:
+
+- Custom binary operators (e.g., `->`, `->>`, `@>`, `?`)
+- Custom field access patterns
+- Custom aggregate or window function handling
+
+#### Available Methods
+
+| Category           | Methods                                                 
                           |
+| ------------------ | 
----------------------------------------------------------------------------------
 |
+| Operators          | `plan_binary_op`, `plan_any`                            
                           |
+| Literals           | `plan_array_literal`, `plan_dictionary_literal`, 
`plan_struct_literal`             |
+| Functions          | `plan_extract`, `plan_substring`, `plan_overlay`, 
`plan_position`, `plan_make_map` |
+| Identifiers        | `plan_field_access`, `plan_compound_identifier`         
                           |
+| Aggregates/Windows | `plan_aggregate`, `plan_window`                         
                           |
+
+See the [ExprPlanner API documentation] for full method signatures.
+
+#### Example: Custom Arrow Operator
+
+This example maps the `->` operator to string concatenation:
+
+```rust
+use std::sync::Arc;
+use datafusion::common::DFSchema;
+use datafusion::error::Result;
+use datafusion::logical_expr::Operator;
+use datafusion::prelude::*;
+use datafusion::sql::sqlparser::ast::BinaryOperator;
+use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};
+use datafusion_expr::BinaryExpr;
+
+#[derive(Debug)]
+struct MyCustomPlanner;
+
+impl ExprPlanner for MyCustomPlanner {
+    fn plan_binary_op(
+        &self,
+        expr: RawBinaryExpr,
+        _schema: &DFSchema,
+    ) -> Result<PlannerResult<RawBinaryExpr>> {
+        match &expr.op {
+            // Map `->` to string concatenation
+            BinaryOperator::Arrow => {
+                Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
+                    left: Box::new(expr.left.clone()),
+                    right: Box::new(expr.right.clone()),
+                    op: Operator::StringConcat,
+                })))
+            }
+            _ => Ok(PlannerResult::Original(expr)),
+        }
+    }
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    // Use postgres dialect to enable `->` operator parsing
+    let config = SessionConfig::new()
+        .set_str("datafusion.sql_parser.dialect", "postgres");
+    let mut ctx = SessionContext::new_with_config(config);
+
+    // Register the custom planner
+    ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;
+
+    // Now `->` works as string concatenation
+    let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
+    // Returns: "helloworld"
+    Ok(())
+}
+```
+
+For more details, see the [ExprPlanner API documentation] and the
+[expr_planner test examples].
+
+### TypePlanner: Custom Data Types
+
+Use [`TypePlanner`] to map SQL data types to Arrow/DataFusion types. This is 
useful
+when you need to support SQL types that aren't natively recognized.
+
+#### Example: Custom DATETIME Type
+
+```rust
+use std::sync::Arc;
+use arrow::datatypes::{DataType, TimeUnit};
+use datafusion::error::Result;
+use datafusion::prelude::*;
+use datafusion::execution::SessionStateBuilder;
+use datafusion_expr::planner::TypePlanner;
+use sqlparser::ast;
+
+#[derive(Debug)]
+struct MyTypePlanner;
+
+impl TypePlanner for MyTypePlanner {
+    fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> {
+        match sql_type {
+            // Map DATETIME(precision) to Arrow Timestamp
+            ast::DataType::Datetime(precision) => {
+                let time_unit = match precision {
+                    Some(0) => TimeUnit::Second,
+                    Some(3) => TimeUnit::Millisecond,
+                    Some(6) => TimeUnit::Microsecond,
+                    None | Some(9) => TimeUnit::Nanosecond,
+                    _ => return Ok(None), // Let default handling take over
+                };
+                Ok(Some(DataType::Timestamp(time_unit, None)))
+            }
+            _ => Ok(None), // Return None for types we don't handle
+        }
+    }
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let state = SessionStateBuilder::new()
+        .with_default_features()
+        .with_type_planner(Arc::new(MyTypePlanner))
+        .build();
+
+    let ctx = SessionContext::new_with_state(state);
+
+    // Now DATETIME type is recognized
+    ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
+    Ok(())
+}
+```
+
+For more details, see the [TypePlanner API documentation].
+
+### RelationPlanner: Custom FROM Clause Elements
+
+Use [`RelationPlanner`] to handle custom relations in the FROM clause. This
+enables you to implement SQL constructs like:
+
+- `TABLESAMPLE` for sampling data
+- `PIVOT` / `UNPIVOT` for data reshaping
+- `MATCH_RECOGNIZE` for pattern matching
+- Any custom relation syntax parsed by sqlparser
+
+#### The RelationPlannerContext
+
+When implementing `RelationPlanner`, you receive a `RelationPlannerContext` 
that
+provides utilities for planning:
+
+| Method                      | Purpose                                        
 |
+| --------------------------- | 
----------------------------------------------- |
+| `plan(relation)`            | Recursively plan a nested relation             
 |
+| `sql_to_expr(expr, schema)` | Convert SQL expression to DataFusion Expr      
 |
+| `context_provider()`        | Access session configuration, tables, 
functions |
+
+See the [RelationPlanner API documentation] for additional methods like
+`normalize_ident()` and `object_name_to_table_reference()`.
+
+#### Implementation Strategies
+
+There are two main approaches when implementing a `RelationPlanner`:
+
+1. **Rewrite to Standard SQL**: Transform custom syntax into equivalent 
standard
+   operations that DataFusion already knows how to execute (e.g., PIVOT → 
GROUP BY
+   with CASE expressions). This is the simplest approach when possible.
+
+2. **Custom Logical and Physical Nodes**: Create a `UserDefinedLogicalNode` to
+   represent the operation in the logical plan, along with a custom 
`ExecutionPlan`
+   to execute it. Both are required for end-to-end execution.
+
+#### Example: Basic RelationPlanner Structure
+
+```rust
+use std::sync::Arc;
+use datafusion::error::Result;
+use datafusion::prelude::*;
+use datafusion_expr::planner::{
+    PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
+};
+use datafusion_sql::sqlparser::ast::TableFactor;
+
+#[derive(Debug)]
+struct MyRelationPlanner;
+
+impl RelationPlanner for MyRelationPlanner {
+    fn plan_relation(
+        &self,
+        relation: TableFactor,
+        ctx: &mut dyn RelationPlannerContext,
+    ) -> Result<RelationPlanning> {
+        match relation {
+            // Handle your custom relation
+            TableFactor::Pivot { table, alias, .. } => {
+                // Plan the input table
+                let input = ctx.plan(*table)?;
+
+                // Transform or wrap the plan as needed
+                // ...
+
+                Ok(RelationPlanning::Planned(PlannedRelation::new(input, 
alias)))
+            }
+
+            // Return Original for relations you don't handle
+            other => Ok(RelationPlanning::Original(other)),
+        }
+    }
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let ctx = SessionContext::new();
+
+    // Register the custom planner
+    ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;
+
+    Ok(())
+}
+```
+
+## Complete Examples
+
+The DataFusion repository includes comprehensive examples demonstrating each
+approach:
+
+### TABLESAMPLE (Custom Logical and Physical Nodes)
+
+The [table_sample.rs] example shows a complete end-to-end implementation 
including:
+
+- Parsing TABLESAMPLE syntax via `RelationPlanner`
+- Custom logical node (`TableSamplePlanNode`)
+- Custom physical operator (`SampleExec`)
+- Bernoulli sampling with reproducible seeds
+
+```sql
+SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
+```

Review Comment:
   I think some of this detail is unnecessary as it repeats what is already in 
the examples -- I think you could remove the bullet points and this would be 
just as good. Something like
   
   (the same comment applies to the others too)
   
   ```suggestion
   The [table_sample.rs] example shows a complete end-to-end implementation of 
how to support queries such as
   
   ```sql
   SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
   ```
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to