subject:"\[jira\] \[Updated\] \(SPARK\-19716\) Dataset should allow by\-name resolution for struct type elements in array"

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-04-04 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-19716:
---
Fix Version/s: (was: 2.3.0)
   2.2.0

> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-02-23 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-19716:

Description: 
if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{arr: array}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow compatible 
types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we will add cast 
for each field, except struct type field, because struct type is flexible, the 
number of columns can mismatch. We should probably also skip cast for array and 
map type.

  was:
if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{arr: array}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. The reason is, we will add cast 
for each field, except struct type field, because struct type is flexible, the 
number of columns can mismatch. We should probably also skip cast for array and 
map type.


> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, to allow 
> compatible types, e.g. convert {{a: int}} to {{case class A(a: Long)}}, we 
> will add cast for each field, except struct type field, because struct type 
> is flexible, the number of columns can mismatch. We should probably also skip 
> cast for array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-02-23 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-19716:

Description: 
if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{arr: array}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. The reason is, we will add cast 
for each field, except struct type field, because struct type is flexible, the 
number of columns can mismatch. We should probably also skip cast for array and 
map type.

  was:
if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{arr: array}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.


> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. The reason is, we will add cast 
> for each field, except struct type field, because struct type is flexible, 
> the number of columns can mismatch. We should probably also skip cast for 
> array and map type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-02-23 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-19716:

Description: 
if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{arr: array}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.

  was:
if we have a DataFrame with schema {{}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{>>}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.


> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{a: int, b: int, c: int}}, and convert it 
> to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{arr: array>}}, and we wanna convert it to Dataset with {{case class 
> ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

2017-02-23 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-19716:

Description: 
if we have a DataFrame with schema {{}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{>>}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.

  was:
if we have a DataFrame with schema {{}}, and convert it 
to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
extract the `a` and `c` columns to build the Data.

However, if the struct is inside array, e.g. schema is {{>}}, and we wanna convert it to Dataset with {{case class 
ComplexData(arr: Seq[Data])}}, we will fail. we should support this case.


> Dataset should allow by-name resolution for struct type elements in array
> -
>
> Key: SPARK-19716
> URL: https://issues.apache.org/jira/browse/SPARK-19716
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>
> if we have a DataFrame with schema {{}}, and convert 
> it to Dataset with {{case class Data(a: Int, c: Int)}}, it works and we will 
> extract the `a` and `c` columns to build the Data.
> However, if the struct is inside array, e.g. schema is {{ array>}}, and we wanna convert it to Dataset 
> with {{case class ComplexData(arr: Seq[Data])}}, we will fail. we should 
> support this case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

[jira] [Updated] (SPARK-19716) Dataset should allow by-name resolution for struct type elements in array

5 matches

Site Navigation

Mail list logo

Footer information