[ 
https://issues.apache.org/jira/browse/SPARK-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tycho Grouwstra updated SPARK-11431:
------------------------------------
    Description: 
I am creating DataFrames from some [JSON 
data|http://www.kayak.com/h/explore/api?airport=AMS], and would like to explode 
an array of structs (as are common in JSON) to their own rows so I could start 
analyzing the data using GraphX. I believe many others might have use for this 
as well, since most web data is in JSON format.

This feature would build upon the existing `explode` functionality added to 
DataFrames by [~marmbrus], which currently errors when you call it on such 
arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor 
function to infer column types -- this approach is insufficient in the case of 
Rows, since their type does not contain the required info. The alternative here 
would be to instead grab the schema info from the existing schema for such 
cases.

I'm trying to implement a patch that might add this functionality, so stay 
tuned until I've figured that out. I'm new here though so I'll probably have 
use for some feedback...


  was:
I am creating DataFrames from some [JSON 
data](http://www.kayak.com/h/explore/api?airport=AMS), and would like to 
explode an array of structs (as are common in JSON) to their own rows so I 
could start analyzing the data using GraphX. I believe many others might have 
use for this as well, since most web data is in JSON format.

This feature would build upon the existing `explode` functionality added to 
DataFrames by [~marmbrus], which currently errors when you call it on such 
arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor 
function to infer column types -- this approach is insufficient in the case of 
Rows, since their type does not contain the required info. The alternative here 
would be to instead grab the schema info from the existing schema for such 
cases.

I'm trying to implement a patch that might add this functionality, so stay 
tuned until I've figured that out. I'm new here though so I'll probably have 
use for some feedback...



> Allow exploding arrays of structs in DataFrames
> -----------------------------------------------
>
>                 Key: SPARK-11431
>                 URL: https://issues.apache.org/jira/browse/SPARK-11431
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Tycho Grouwstra
>              Labels: features
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am creating DataFrames from some [JSON 
> data|http://www.kayak.com/h/explore/api?airport=AMS], and would like to 
> explode an array of structs (as are common in JSON) to their own rows so I 
> could start analyzing the data using GraphX. I believe many others might have 
> use for this as well, since most web data is in JSON format.
> This feature would build upon the existing `explode` functionality added to 
> DataFrames by [~marmbrus], which currently errors when you call it on such 
> arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor 
> function to infer column types -- this approach is insufficient in the case 
> of Rows, since their type does not contain the required info. The alternative 
> here would be to instead grab the schema info from the existing schema for 
> such cases.
> I'm trying to implement a patch that might add this functionality, so stay 
> tuned until I've figured that out. I'm new here though so I'll probably have 
> use for some feedback...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to