askalt commented on issue #19351: URL: https://github.com/apache/datafusion/issues/19351#issuecomment-3796239651
> [@alamb](https://github.com/alamb) As I understand it, [this](https://github.com/askalt/datafusion/commit/07ebbd4d321e4753cf566c40192b124654ff7455) implementation doesn't seem convincing because it introduces too many changes to the trait's public API. I would still like to extract runtime state from `ExecutionPlan` one way or another. In our project, we actively reuse pre-built plans, and the current `reset_plan_states` implementation doesn't allow this if the plan contains dynamic filters or a recursive subquery. > > To solve these problems, I thought about [updating](https://github.com/apache/datafusion/issues/19905) the dynamic filters in all nodes before reusing a plan. I don't like this solution because it doesn't scale: what if new expressions requiring updates before re-execution are added? In that case, similar ad-hoc manipulations would be needed for each of them. For example, we also actively use placeholders (query params) in physical plans; in the current design, updating placeholders forces a rebuild of all plan nodes. > > I believe that current design (store state directly in plans) is not friendly for re-using. If we consider how this is done, for example in PostgreSQL, the state of plans and expressions needed for execution is stored separately from the planned expressions themselves. This allows re-using plans and their expressions without the need for constant recreation. For a reference, see the [function](https://github.com/postgres/postgres/blob/9b9eaf08ab2dc22c691b22e59f1574e0f1bcc822/src/backend/executor/execExpr.c#L108-L143). > > ## Possible solution > How do you feel about the following plan? > > 1. Add a new method `ExecutionPlan::execute_with_state(...)`. If there is a downstream which is not interested in stateless-plans it should not implement it, a default behavior will call `ExecutionPlan::execute(...)` and by default plan will store its state in self as it is works now. > 2. Support stateless execution for plans across DF crates. Stateless means that metrics, futures, work tables, actual dynamic filters are taken from some context passed into `execute_with_state` instead of taking them from `self`. > 3. In the `execute_with_state` we can convert plan expressions into expressions that could be actually evaluated. For example, convert `PlaceholderExpr` into `Literal` using query params taking from the query execution context. Or convert planned dynamic filter to its executable version created by the owner (and passed also through context). I made a tiny example how it would work for `ProjectionExec`: https://github.com/askalt/datafusion/pull/4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
