suibianwanwank commented on issue #8777: URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015967757
Hi, I’d like to take a try at this task. My plan is to first support `CTE` with the WITH ... AS MATERIALIZED syntax. After that, we can explore broader optimizations in the optimizer phase. As for the implementation of `CTE` in the ExecutionPlan, I currently have two ideas: 1. Add a top-level `CTEQueryExec` node to ensure the CTE query is fully executed (blocking), and cache its result in a `CTEWorkTable` so that subsequent CTEScan nodes can read from it. (It seems DuckDB takes a similar approach.) However, I'm not sure if this is feasible in DataFusion, as it might require blocking the ExecutionPlan and collecting all RecordBatches. 2. Materialize the CTE before execution by creating a temporary table in the current schema and writing the required data into it. Then, replace all following scan with queries against this temporary table. (I noticed that Databend uses a similar approach.) I’d like to gather some feedback first to make sure the design is general enough and correct. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org