shrirangmhalgi opened a new pull request, #56056:
URL: https://github.com/apache/spark/pull/56056

   ### What changes were proposed in this pull request?
   Fix `DetectAmbiguousSelfJoin` to not flag column references as ambiguous 
when the root plan is a Project on top of a self-join with a foldable join 
condition `(e.g., df.join(df, df("col") === 0).select(df("col")))`.
   
   When the join condition compares a column to a literal, it doesn't matter 
which side the column comes from - both sides have identical data. The 
ambiguity check was incorrectly rejecting this pattern.
   
   
   ### Why are the changes needed?
   `df.join(df, df("col") === 0).select(df("col"))` throws `AnalysisException: 
Column are ambiguous with the regular resolver when 
spark.sql.analyzer.failAmbiguousSelfJoin is true`. The single-pass resolver 
handles this correctly. This inconsistency breaks multi-layer self-join 
patterns that work fine with the single-pass resolver.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Self-join queries with foldable conditions followed by select no longer 
throw a false ambiguity error.
   
   ### Design approach
   First attempted a broader fix: skip ambiguity check for any column reference 
whose `exprId` matches the plan's output 
`(outputExprIds.contains(attr.exprId))`. This was too permissive - broke 4 
existing tests (SPARK-28344: fail ambiguous self join - column ref in Project, 
join three tables, SPARK-33071, SPARK-35454: join four tables) because it 
suppressed legitimate ambiguity errors where the user genuinely needs to alias.
   
   Narrowed to the specific case: only skip when the root plan is a Project 
directly on top of a `self-join (leftId == rightId)` with a foldable join 
condition. This correctly targets the false positive without affecting real 
ambiguity detection.
   
   
   ### How was this patch tested?
   Added a test in `DataFrameSelfJoinSuite` verifying single-layer and 
multi-layer self-joins with foldable conditions. All 23 existing self-join 
tests are passing.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this 
patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   Yes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to