Derrick Burns created SPARK-5443: ------------------------------------ Summary: jsonRDD with schema should ignore sub-objects that are omitted in schema Key: SPARK-5443 URL: https://issues.apache.org/jira/browse/SPARK-5443 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.2.0 Reporter: Derrick Burns
Reading the code for jsonRDD, it appears that all fields of a JSON object are read into a ROW independent of the provided schema. I would expect it to be more efficient to only store in the ROW those fields that are explicitly included in the schema. For example, assume that I only wish to extract the "id" field of a tweet. If I provided a schema that simply had one field within a map named "id", then the row object would only store that field within a map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org