Yves Li created SPARK-38603:
-------------------------------

             Summary: Qualified star selection produces duplicated common 
columns after join then alias
                 Key: SPARK-38603
                 URL: https://issues.apache.org/jira/browse/SPARK-38603
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0
         Environment: OS: Ubuntu 18.04.5 LTS
Scala version: 2.12.15
            Reporter: Yves Li


When joining two DataFrames and then aliasing the result, selecting columns 
from the resulting Dataset by a qualified star produces duplicates of the 
joined columns.
{code:scala}
scala> val df1 = Seq((1, 10), (2, 20)).toDF("a", "x")
df1: org.apache.spark.sql.DataFrame = [a: int, x: int]

scala> val df2 = Seq((2, 200), (3, 300)).toDF("a", "y")
df2: org.apache.spark.sql.DataFrame = [a: int, y: int]

scala> val joined = df1.join(df2, "a").alias("joined")
joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [a: int, x: 
int ... 1 more field]

scala> joined.select("*").show()
+---+---+---+
|  a|  x|  y|
+---+---+---+
|  2| 20|200|
+---+---+---+

scala> joined.select("joined.*").show()
+---+---+---+---+
|  a|  a|  x|  y|
+---+---+---+---+
|  2|  2| 20|200|
+---+---+---+---+

scala> joined.select("*").select("joined.*").show()
+---+---+---+
|  a|  x|  y|
+---+---+---+
|  2| 20|200|
+---+---+---+ {code}
This appears to be introduced by SPARK-34527, leading to some surprising 
behaviour. Using an earlier version, such as Spark 3.0.2, produces the same 
output for all three {{{}show(){}}}s.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to