[ https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Huw updated SPARK-45583: ------------------------ Fix Version/s: 3.5.0 > Spark SQL returning incorrect values for full outer join on keys with the > same name. > ------------------------------------------------------------------------------------ > > Key: SPARK-45583 > URL: https://issues.apache.org/jira/browse/SPARK-45583 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.1 > Reporter: Huw > Priority: Major > Fix For: 3.5.0 > > > {{The following query gives the wrong results.}} > > {{WITH people as (}} > {{ SELECT * FROM (VALUES }} > {{ (1, 'Peter'), }} > {{ (2, 'Homer'), }} > {{ (3, 'Ned'),}} > {{ (3, 'Jenny')}} > {{ ) AS Idiots(id, FirstName)}} > {{{}){}}}{{{}, location as ({}}} > {{ SELECT * FROM (VALUES}} > {{ (1, 'sample0'),}} > {{ (1, 'sample1'),}} > {{ (2, 'sample2') }} > {{ ) as Locations(id, address)}} > {{{}){}}}{{{}SELECT{}}} > {{ *}} > {{FROM}} > {{ people}} > {{FULL OUTER JOIN}} > {{ location}} > {{ON}} > {{ people.id = location.id}} > {{We find the following table:}} > ||id: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |null|Ned|null|null| > |null|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| > {{But clearly the first `id` column is wrong, the nulls should be 3.}} > If we rename the id column in (only) the person table to pid we get the > correct results: > ||pid: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |3|Ned|null|null| > |3|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org