[jira] [Commented] (SPARK-23439) Ambiguous reference when selecting column inside StructType with same name that outer colum

Wenchen Fan (JIRA) Fri, 16 Feb 2018 06:51:31 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-23439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367414#comment-16367414
 ]


Wenchen Fan commented on SPARK-23439:
-------------------------------------

This is a valid behavior, as `a.b` is an invalid column name for most of the 
external storages like parquet. I think it's reasonable to name the nested file 
according to the deepest field. Users should manually alias the column to avoid 
duplication before saving data to external storages.

> Ambiguous reference when selecting column inside StructType with same name 
> that outer colum
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23439
>                 URL: https://issues.apache.org/jira/browse/SPARK-23439
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: Scala 2.11.8, Spark 2.2.0
>            Reporter: Alejandro Trujillo Caballero
>            Priority: Minor
>
> Hi.
> I've seen that when working with nested struct fields in a DataFrame and 
> doing a select operation the nesting is lost and this can result in 
> collisions between column names.
> For example:
>  
> {code:java}
> case class Foo(a: Int, b: Bar)
> case class Bar(a: Int)
> val items = List(
>   Foo(1, Bar(1)),
>   Foo(2, Bar(2))
> )
> val df = spark.createDataFrame(items)
> val df_a_a = df.select($"a", $"b.a").show
> //+---+---+
> //|  a|  a|
> //+---+---+
> //|  1|  1|
> //|  2|  2|
> //+---+---+
> df.select($"a", $"b.a").printSchema
> //root
> //|-- a: integer (nullable = false)
> //|-- a: integer (nullable = true)
> df.select($"a", $"b.a").select($"a")
> //org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could 
> be: a#9, a#{code}
>  
>  
> Shouldn't the second column be named "b.a"?
>  
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23439) Ambiguous reference when selecting column inside StructType with same name that outer colum

Reply via email to