Re: [PR] [SPARK-46625] CTE with Identifier clause as reference [spark]

via GitHub Wed, 03 Jul 2024 05:43:07 -0700


cloud-fan commented on PR #47180:
URL: https://github.com/apache/spark/pull/47180#issuecomment-2205988941


   I'm not a big fan of this approach, as this duplicates the handling of 
IDENTIFIER clauses in `CTESubstitution`.
   
   IMO, the root cause is we special-case CTE resolution and run 
`CTESubstitution` as an individual batch at the very beginning. The ideal 
solution is to look up CTE relations together with the normal table lookup.
   
   My idea: let's separate CTE relations into two steps:
   1. identify the available CTE relations for each `UnresolvedRelation`. Given 
the position of `UnresolvedRelation`, the available CTE relations can be very 
different (e.g. in the main query, in the CTE relations, in nested CTE, etc.). 
Then we wrap `UnresolvedRelation` with a new node `WithCTERelations` to hold 
available CTE relations.
   2. In the analyzer main batch, we wait for the IDENTIFIER clause to be 
handled, then unwrap `WithCTERelations` by looking up from CTE relations. If 
lookup failed, restore to `UnresolvedRelation` so that normal table lookup rule 
can handle it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46625] CTE with Identifier clause as reference [spark]

Reply via email to