[ https://issues.apache.org/jira/browse/SPARK-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tycho Grouwstra updated SPARK-11431: ------------------------------------ Description: I am creating DataFrames from some [JSON data|http://www.kayak.com/h/explore/api?airport=AMS], and would like to explode an array of structs (as are common in JSON) to their own rows so I could start analyzing the data using GraphX. I believe many others might have use for this as well, since most web data is in JSON format. This feature would build upon the existing `explode` functionality added to DataFrames by [~marmbrus], which currently errors when you call it on such arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor function to infer column types -- this approach is insufficient in the case of Rows, since their type does not contain the required info. The alternative here would be to instead grab the schema info from the existing schema for such cases. I'm trying to implement a patch that might add this functionality, so stay tuned until I've figured that out. I'm new here though so I'll probably have use for some feedback... was: I am creating DataFrames from some [JSON data](http://www.kayak.com/h/explore/api?airport=AMS), and would like to explode an array of structs (as are common in JSON) to their own rows so I could start analyzing the data using GraphX. I believe many others might have use for this as well, since most web data is in JSON format. This feature would build upon the existing `explode` functionality added to DataFrames by [~marmbrus], which currently errors when you call it on such arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor function to infer column types -- this approach is insufficient in the case of Rows, since their type does not contain the required info. The alternative here would be to instead grab the schema info from the existing schema for such cases. I'm trying to implement a patch that might add this functionality, so stay tuned until I've figured that out. I'm new here though so I'll probably have use for some feedback... > Allow exploding arrays of structs in DataFrames > ----------------------------------------------- > > Key: SPARK-11431 > URL: https://issues.apache.org/jira/browse/SPARK-11431 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Tycho Grouwstra > Labels: features > Original Estimate: 24h > Remaining Estimate: 24h > > I am creating DataFrames from some [JSON > data|http://www.kayak.com/h/explore/api?airport=AMS], and would like to > explode an array of structs (as are common in JSON) to their own rows so I > could start analyzing the data using GraphX. I believe many others might have > use for this as well, since most web data is in JSON format. > This feature would build upon the existing `explode` functionality added to > DataFrames by [~marmbrus], which currently errors when you call it on such > arrays of `InternalRow`s. This relates to `explode`'s use of the schemaFor > function to infer column types -- this approach is insufficient in the case > of Rows, since their type does not contain the required info. The alternative > here would be to instead grab the schema info from the existing schema for > such cases. > I'm trying to implement a patch that might add this functionality, so stay > tuned until I've figured that out. I'm new here though so I'll probably have > use for some feedback... -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org