[
https://issues.apache.org/jira/browse/SPARK-33871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254224#comment-17254224
]
L. C. Hsieh commented on SPARK-33871:
-
For self-join, Spark will add alias to ambiguous columns in the join query. But
semiJoin as a query, the column col is still referred to df.col. So
left.select(semiJoin(col)), left.select(df(col)) are basically selecting same
column.
If you want to access the column col of the semi join in the left join, a work
around is to put a relation alias and access col on top of this relation alias.
{code}
scala> val semiJoin = df.join(df2, df(col) === df2(col),
"left_semi").as("left_semi")
scala> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
scala> left.select("left_semi.c1").show
++
| c1|
++
| 1|
|null|
|null|
|null|
++
{code}
> Cannot access to column after left semi join and left join
> ---
>
> Key: SPARK-33871
> URL: https://issues.apache.org/jira/browse/SPARK-33871
> Project: Spark
> Issue Type: Bug
> Components: SQL
>Affects Versions: 3.0.0
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> Cannot access to column after left semi join and left join
> {code}
> val col = "c1"
> val df = Seq((1, "a"),(2, "a"),(3, "a"),(4, "a")).toDF(col, "c2")
> val df2 = Seq(1).toDF(col)
> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi")
> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
> left.show
> +---+---+++
> | c1| c2| c1| c2|
> +---+---+++
> | 1| a| 1| a|
> | 2| a|null|null|
> | 3| a|null|null|
> | 4| a|null|null|
> +---+---+++
> left.select(semiJoin(col))
> +---+
> | c1|
> +---+
> | 1|
> | 2|
> | 3|
> | 4|
> +---+
> left.select(df(col))
> +---+
> | c1|
> +---+
> | 1|
> | 2|
> | 3|
> | 4|
> +---+
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org