findepi commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015995295
Maintaining streaming processing is very useful in the number of circumstances: a query with LIMIT, a query with TopN over sorted input data, an interactive query. I don't know how CTE reuse execution v1 should look like, but we should work towards the goal that all plan nodes dependent on the CTE can incrementally consume data passed to the CTE. Effectively, each should maintain an independent iterator over data being produced. Additionally, CTE should be aware of number of upstream consumers, which allows it to free data no longer needed by any consumer. Implementation wise, it could be done with a separate blocking deque (VecDeque?) for each consumer. Incoming data is added to all queues (shared Arcs, no data copying). Consumers pulls from the queue. When it's empty, it waits for new data or end of processing signal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org