cloud-fan commented on PR #47180: URL: https://github.com/apache/spark/pull/47180#issuecomment-2205988941
I'm not a big fan of this approach, as this duplicates the handling of IDENTIFIER clauses in `CTESubstitution`. IMO, the root cause is we special-case CTE resolution and run `CTESubstitution` as an individual batch at the very beginning. The ideal solution is to look up CTE relations together with the normal table lookup. My idea: let's separate CTE relations into two steps: 1. identify the available CTE relations for each `UnresolvedRelation`. Given the position of `UnresolvedRelation`, the available CTE relations can be very different (e.g. in the main query, in the CTE relations, in nested CTE, etc.). Then we wrap `UnresolvedRelation` with a new node `WithCTERelations` to hold available CTE relations. 2. In the analyzer main batch, we wait for the IDENTIFIER clause to be handled, then unwrap `WithCTERelations` by looking up from CTE relations. If lookup failed, restore to `UnresolvedRelation` so that normal table lookup rule can handle it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org