[jira] [Commented] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

Simeon Simeonov (JIRA) Thu, 23 Feb 2017 14:53:24 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881490#comment-15881490
 ]


Simeon Simeonov commented on SPARK-19716:
-----------------------------------------

This is an important issue because it prevent schema evolution with datasets 
that is {{mergeSchema=true}} compatible for dataframes. This means two things:

1. Customers currently using dataframes with non-trivial schema may not be able 
to migrate to datasets.
2. Customers that migrate (or start) using datasets may be stuck not being able 
to evolve their schema.

> Dataset should allow by-name resolution for struct type elements in array
> -------------------------------------------------------------------------
>
>                 Key: SPARK-19716
>                 URL: https://issues.apache.org/jira/browse/SPARK-19716
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array<struct<a: 
> int, b: int, c: int>>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

Reply via email to