adriangb opened a new issue, #22419:
URL: https://github.com/apache/datafusion/issues/22419

   #21929 added a `try_to_proto` / `try_from_proto` hook on `PhysicalExpr` so 
each expression can serialize itself and keep its internal state private, 
instead of round-tripping through a central ~300-line `downcast_ref` chain in 
`datafusion-proto`. The "Future work" section of that PR calls out applying the 
same pattern to `ExecutionPlan`; this issue tracks that work.
   
   ## Why
   
   `ExecutionPlan` serialization has the same shape (and the same problems) as 
`PhysicalExpr` had:
   
   - A central downcast chain in 
[`datafusion/proto/src/physical_plan/mod.rs`](https://github.com/apache/datafusion/blob/main/datafusion/proto/src/physical_plan/mod.rs)
 — ~30 `plan.downcast_ref::<…Exec>()` branches on encode, plus a symmetric 
`match` on `PhysicalPlanType` on decode.
   - Internal state has to be made `pub` so the central encoder can read it. 
#22011 and #21807 are recent examples — both added public methods on 
`ExecutionPlan` impls (`from_parts`, `inner()`, `original_children`, 
`remapped_children`, etc.) for the sole purpose of letting `datafusion-proto` 
reach in.
   - Easy to forget when adding a new `ExecutionPlan`, or when adding a field 
to an existing one — the bug only fires when someone serializes a plan that 
uses it (see e.g. #19379, #22065).
   - Third-party `ExecutionPlan` impls have to go through 
`PhysicalExtensionCodec`, while built-ins go through the central match — same 
asymmetry the `PhysicalExpr` work just fixed.
   
   ## What
   
   Mirror the pattern from #21929:
   
   1. Add `try_to_proto` / `try_from_proto` hooks on `ExecutionPlan`, 
feature-gated, default `Ok(None)` (fall through to the existing downcast chain).
   2. Migrate one `ExecutionPlan` as the working demo (e.g. `FilterExec` or 
`ProjectionExec`).
   3. Open one follow-up PR per remaining built-in `ExecutionPlan` to migrate 
it. No wire-format change at any step.
   4. Once all built-ins are migrated, delete the central downcast chain and 
revert the `pub`-for-proto scaffolding added in #22011 / #21807.
   
   ## Scope
   
   Built-in `ExecutionPlan` impls under `datafusion/physical-plan`, including 
(non-exhaustive): `ProjectionExec`, `FilterExec`, `GlobalLimitExec`, 
`LocalLimitExec`, `HashJoinExec`, `SymmetricHashJoinExec`, `SortMergeJoinExec`, 
`CrossJoinExec`, `NestedLoopJoinExec`, `AggregateExec`, `EmptyExec`, 
`PlaceholderRowExec`, `CoalesceBatchesExec`, `CoalescePartitionsExec`, 
`RepartitionExec`, `SortExec`, `SortPreservingMergeExec`, `UnionExec`, 
`InterleaveExec`, `WindowAggExec`, `BoundedWindowAggExec`, `DataSinkExec`, 
`UnnestExec`, `CooperativeExec`, `LazyMemoryExec`, `AsyncFuncExec`, 
`BufferExec`, `ScalarSubqueryExec`, `ExplainExec`, `AnalyzeExec`, 
`DataSourceExec` (and its `FileScanConfig` / `MemorySourceConfig` sub-encoders).
   
   The `DataSourceExec` branch is the gnarliest — it itself dispatches on 
`FileSource` / `MemorySourceConfig` / file format sinks (`CsvSink`, `JsonSink`, 
`ParquetSink`). It probably wants the same trait-hook treatment one layer down, 
tracked as a sub-task.
   
   Related: #22418 (the `PhysicalExpr` equivalent).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to