Huw created SPARK-45583: --------------------------- Summary: Spark SQL returning incorrect values for full outer join on keys with the same name. Key: SPARK-45583 URL: https://issues.apache.org/jira/browse/SPARK-45583 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Huw
{{The following query gives the wrong results.}} {{WITH people as (}} {{ SELECT * FROM (VALUES }} {{ (1, 'Peter'), }} {{ (2, 'Homer'), }} {{ (3, 'Ned'),}} {{ (3, 'Jenny')}} {{ ) AS Idiots(id, FirstName)}} {{{}){}}}{{{}, location as ({}}} {{ SELECT * FROM (VALUES}} {{ (1, 'sample0'),}} {{ (1, 'sample1'),}} {{ (2, 'sample2') }} {{ ) as Locations(id, address)}} {{{}){}}}{{{}SELECT{}}} {{ *}} {{FROM}} {{ people}} {{FULL OUTER JOIN}} {{ location}} {{ON}} {{ people.id = location.id}} {{We find the following table:}} {{}} ||idinteger||FirstNamestring||idinteger||addressstring|| |2|Homer|2|sample2| |null|Ned|null|null| |null|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| {{But clearly the first `id` table is wrong, the nulls should be 3.}} If we rename the id column in (only) the person table to pid we get the correct results: ||pidinteger||FirstNamestring||idinteger||addressstring|| |2|Homer|2|sample2| |3|Ned|null|null| |3|Jenny|null|null| |1|Peter|1|sample0| |1|Peter|1|sample1| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org