suibianwanwank commented on issue #8777:
URL: https://github.com/apache/datafusion/issues/8777#issuecomment-3015967757

   Hi, I’d like to take a try at this task.
   
   My plan is to first support `CTE` with the WITH ... AS MATERIALIZED syntax. 
After that, we can explore broader optimizations in the optimizer phase. 
   
   As for the implementation of `CTE` in the ExecutionPlan, I currently have 
two ideas:
   
   1. Add a top-level `CTEQueryExec` node to ensure the CTE query is fully 
executed (blocking), and cache its result in a `CTEWorkTable` so that 
subsequent CTEScan nodes can read from it. (It seems DuckDB takes a similar 
approach.)
   However, I'm not sure if this is feasible in DataFusion, as it might require 
blocking the ExecutionPlan and collecting all RecordBatches.
   
   2. Materialize the CTE before execution by creating a temporary table in the 
current schema and writing the required data into it. Then, replace all 
following scan with queries against this temporary table. (I noticed that 
Databend uses a similar approach.)
   
   I’d like to gather some feedback first to make sure the design is general 
enough and correct. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to