[ https://issues.apache.org/jira/browse/SPARK-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567279#comment-14567279 ]
Jianshi Huang commented on SPARK-4782: -------------------------------------- Thanks Luca for the clever fix! I also noticed that the schema inference in JsonRDD is too JSON specific. As JSON's datatype is quite limited. Jianshi > Add inferSchema support for RDD[Map[String, Any]] > ------------------------------------------------- > > Key: SPARK-4782 > URL: https://issues.apache.org/jira/browse/SPARK-4782 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Jianshi Huang > Priority: Minor > > The best way to convert RDD[Map[String, Any]] to SchemaRDD currently seems to > be converting each Map to JSON String first and use JsonRDD.inferSchema on it. > It's very inefficient. > Instead of JsonRDD, RDD[Map[String, Any]] is a better common denominator for > Schemaless data as adding Map like interface to any serialization format is > easy. > So please add inferSchema support to RDD[Map[String, Any]]. *Then for any new > serialization format we want to support, we just need to add a Map interface > wrapper to it* > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org