gabotechs opened a new issue, #16088: URL: https://github.com/apache/datafusion/issues/16088
### Is your feature request related to a problem or challenge? There's one assumption that seem to be true for most `PhysicalPlan` implementations, but not for `RepartitionExec`: calling `RepartitionExec.execute()` does not immediately call `RepartitionExec.input.execute()`, instead, it's lazy called when the returned arrow stream is first polled. The rationale of why this is useful can be very well explained with the use case that I'm dealing with currently: I have some custom leaf `PhysicalPlan` implementation that performs a API call (let's call it `MyApiExec`), and I would like to eagerly call the API and pre-fetch some data before the overall arrow stream is first polled. For that, I'd like to `tokio::spawn` a task when DataFusion calls `MyCustomNode::execute` that will run in the background pre-fetching data from the API even if the stream is not being polled. My expectation would be that `.execute()` is propagated immediately across the execution graph:  However, I found that the current implementation of the `RepartitionExec` node shipped in https://github.com/apache/datafusion/issues/10014 is delaying the `.execute()` calls to the children until the first message in the arrow stream is polled, which implies that the pre-fetching logic in `MyApiExec` is executed lazily upon the first message poll rather than eagerly before any poll:  ### Describe the solution you'd like Instead of lazily calling `input.execute()` inside `RepartitionExec` here: https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/repartition/mod.rs#L598-L616 Call it immediately without any async gap so that further `.execute()` calls to the children happen before any message is polled. ### Describe alternatives you've considered Spawning the pre-fetching task in `MyApiExec` upon creating the node at planning time, but it feels counter intuitive and error prone. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
