Jefffrey commented on code in PR #18372:
URL: https://github.com/apache/datafusion/pull/18372#discussion_r2534000413
##########
docs/source/library-user-guide/extending-operators.md:
##########
@@ -57,3 +54,115 @@ fn agg_to_table_scan(result: f64, schema: SchemaRef) ->
Result<LogicalPlan> {
```
To get a deeper dive into the usage of the µWheel project, visit the [blog
post](https://uwheel.rs/post/datafusion_uwheel/) by Max Meldrum.
+
+## Example 2: TopK Operator
+
+This example demonstrates creating a custom TopK operator that optimizes
queries like "find the top 3 customers by revenue". Instead of fully sorting
the input and discarding all but the top K elements, the TopK operator
maintains only the K largest elements in memory, significantly reducing memory
usage.
+
+### Overview
+
+Creating a custom operator in DataFusion requires implementing four key
components that work together:
+
+1. **Logical Plan Node** (`TopKPlanNode`) - Represents the TopK operation in
the logical plan
+2. **Optimizer Rule** (`TopKOptimizerRule`) - Detects `LIMIT(SORT(...))`
patterns and replaces them with TopK
+3. **Physical Planner** (`TopKPlanner`) - Converts the logical TopK node into
a physical execution plan
+4. **Physical Execution** (`TopKExec`) - Executes the TopK algorithm on actual
data
+
+### Implementation
+
+The optimizer rule identifies queries with a `LIMIT` clause applied to a
`SORT` operation. When this pattern is detected, it replaces the combination
with a single `TopK` node:
+```rust,ignore
+impl OptimizerRule for TopKOptimizerRule {
+ fn rewrite(
+ &self,
+ plan: LogicalPlan,
+ _config: &dyn OptimizerConfig,
+ ) -> Result<Transformed> {
Review Comment:
I'm starting to think it's better not to have the `ignore` as some of this
syntax is not correct
##########
docs/source/library-user-guide/extending-operators.md:
##########
@@ -43,11 +42,9 @@ fn rewrite(
Ok(Transformed::no(plan))
}
}
-```
-```rust,ignore
// Converts a uwheel aggregate result to a TableScan with a MemTable as source
-fn agg_to_table_scan(result: f64, schema: SchemaRef) -> Result<LogicalPlan> {
+fn agg_to_table_scan(result: f64, schema: SchemaRef) -> Result {
Review Comment:
I'm very confused why these changes are being made, especially considering
it makes the example invalid?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]