[ https://issues.apache.org/jira/browse/SPARK-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-5443: ----------------------------------- Assignee: (was: Apache Spark) > jsonRDD with schema should ignore sub-objects that are omitted in schema > ------------------------------------------------------------------------ > > Key: SPARK-5443 > URL: https://issues.apache.org/jira/browse/SPARK-5443 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 1.2.0 > Reporter: Derrick Burns > Original Estimate: 168h > Remaining Estimate: 168h > > Reading the code for jsonRDD, it appears that all fields of a JSON object are > read into a ROW independent of the provided schema. I would expect it to be > more efficient to only store in the ROW those fields that are explicitly > included in the schema. > For example, assume that I only wish to extract the "id" field of a tweet. > If I provided a schema that simply had one field within a map named "id", > then the row object would only store that field within a map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org