[jira] [Assigned] (SPARK-4782) Add inferSchema support for RDD[Map[String, Any]]
[ https://issues.apache.org/jira/browse/SPARK-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4782: --- Assignee: (was: Apache Spark) > Add inferSchema support for RDD[Map[String, Any]] > - > > Key: SPARK-4782 > URL: https://issues.apache.org/jira/browse/SPARK-4782 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jianshi Huang >Priority: Minor > > The best way to convert RDD[Map[String, Any]] to SchemaRDD currently seems to > be converting each Map to JSON String first and use JsonRDD.inferSchema on it. > It's very inefficient. > Instead of JsonRDD, RDD[Map[String, Any]] is a better common denominator for > Schemaless data as adding Map like interface to any serialization format is > easy. > So please add inferSchema support to RDD[Map[String, Any]]. *Then for any new > serialization format we want to support, we just need to add a Map interface > wrapper to it* > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-4782) Add inferSchema support for RDD[Map[String, Any]]
[ https://issues.apache.org/jira/browse/SPARK-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-4782: --- Assignee: Apache Spark > Add inferSchema support for RDD[Map[String, Any]] > - > > Key: SPARK-4782 > URL: https://issues.apache.org/jira/browse/SPARK-4782 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jianshi Huang >Assignee: Apache Spark >Priority: Minor > > The best way to convert RDD[Map[String, Any]] to SchemaRDD currently seems to > be converting each Map to JSON String first and use JsonRDD.inferSchema on it. > It's very inefficient. > Instead of JsonRDD, RDD[Map[String, Any]] is a better common denominator for > Schemaless data as adding Map like interface to any serialization format is > easy. > So please add inferSchema support to RDD[Map[String, Any]]. *Then for any new > serialization format we want to support, we just need to add a Map interface > wrapper to it* > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org