[ https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Huw updated SPARK-45583: ------------------------ Description: {{The following query gives the wrong results.}} {{WITH people as (}} {{ SELECT * FROM (VALUES }} {{ (1, 'Peter'), }} {{ (2, 'Homer'), }} {{ (3, 'Ned'),}} {{ (3, 'Jenny')}} {{ ) AS Idiots(id, FirstName)}} {{{}){}}}{{{}, location as ({}}} {{ SELECT * FROM (VALUES}} {{ (1, 'sample0'),}} {{ (1, 'sample1'),}} {{ (2, 'sample2') }} {{ ) as Locations(id, address)}} {{{}){}}}{{{}SELECT{}}} {{ *}} {{FROM}} {{ people}} {{FULL OUTER JOIN}} {{ location}} {{ON}} {{ people.id = location.id}} {{We find the following table:}} ||id: integer||FirstName: string||id: integer||address: string|| |2|Homer|2|sample2| |null|Ned|null|null| |null|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| {{But clearly the first `id` table is wrong, the nulls should be 3.}} If we rename the id column in (only) the person table to pid we get the correct results: ||pid: integer||FirstName: string||id: integer||address: string|| |2|Homer|2|sample2| |3|Ned|null|null| |3|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| was: {{The following query gives the wrong results.}} {{WITH people as (}} {{ SELECT * FROM (VALUES }} {{ (1, 'Peter'), }} {{ (2, 'Homer'), }} {{ (3, 'Ned'),}} {{ (3, 'Jenny')}} {{ ) AS Idiots(id, FirstName)}} {{{}){}}}{{{}, location as ({}}} {{ SELECT * FROM (VALUES}} {{ (1, 'sample0'),}} {{ (1, 'sample1'),}} {{ (2, 'sample2') }} {{ ) as Locations(id, address)}} {{{}){}}}{{{}SELECT{}}} {{ *}} {{FROM}} {{ people}} {{FULL OUTER JOIN}} {{ location}} {{ON}} {{ people.id = location.id}} {{We find the following table:}} {{}} ||idinteger||FirstNamestring||idinteger||addressstring|| |2|Homer|2|sample2| |null|Ned|null|null| |null|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| {{But clearly the first `id` table is wrong, the nulls should be 3.}} If we rename the id column in (only) the person table to pid we get the correct results: ||pidinteger||FirstNamestring||idinteger||addressstring|| |2|Homer|2|sample2| |3|Ned|null|null| |3|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| > Spark SQL returning incorrect values for full outer join on keys with the > same name. > ------------------------------------------------------------------------------------ > > Key: SPARK-45583 > URL: https://issues.apache.org/jira/browse/SPARK-45583 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Huw > Priority: Major > > {{The following query gives the wrong results.}} > > {{WITH people as (}} > {{ SELECT * FROM (VALUES }} > {{ (1, 'Peter'), }} > {{ (2, 'Homer'), }} > {{ (3, 'Ned'),}} > {{ (3, 'Jenny')}} > {{ ) AS Idiots(id, FirstName)}} > {{{}){}}}{{{}, location as ({}}} > {{ SELECT * FROM (VALUES}} > {{ (1, 'sample0'),}} > {{ (1, 'sample1'),}} > {{ (2, 'sample2') }} > {{ ) as Locations(id, address)}} > {{{}){}}}{{{}SELECT{}}} > {{ *}} > {{FROM}} > {{ people}} > {{FULL OUTER JOIN}} > {{ location}} > {{ON}} > {{ people.id = location.id}} > {{We find the following table:}} > ||id: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |null|Ned|null|null| > |null|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| > {{But clearly the first `id` table is wrong, the nulls should be 3.}} > If we rename the id column in (only) the person table to pid we get the > correct results: > ||pid: integer||FirstName: string||id: integer||address: string|| > |2|Homer|2|sample2| > |3|Ned|null|null| > |3|Jenny|null|null| > |1|Peter|1|sample0| > |1|Peter|1|sample1| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org