I'm loading json into spark to create a schemaRDD (sqlContext.jsonRDD(..)). I'd like some of the json fields to be in a MapType rather than a sub StructType, as the keys will be very sparse.
For example: > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > val jsonRdd = sc.parallelize(Seq("""{"key": "1234", "attributes": > {"gender": "m"}}""", """{"key": "4321", "attributes": {"location": "nyc"}}""")) > val schemaRdd = sqlContext.jsonRDD(jsonRdd) > schemaRdd.printSchema root |-- attributes: struct (nullable = true) | |-- gender: string (nullable = true) | |-- location: string (nullable = true) |-- key: string (nullable = true) > schemaRdd.collect res1: Array[org.apache.spark.sql.Row] = Array([[m,null],1234], [[null,nyc],4321]) However this isn't what I want. So I created my own StructType to pass to the jsonRDD call: > import org.apache.spark.sql._ > val st = StructType(Seq(StructField("key", StringType, false), StructField("attributes", MapType(StringType, StringType, false)))) > val jsonRddSt = sc.parallelize(Seq("""{"key": "1234", "attributes": > {"gender": "m"}}""", """{"key": "4321", "attributes": {"location": "nyc"}}""")) > val schemaRddSt = sqlContext.jsonRDD(jsonRddSt, st) > schemaRddSt.printSchema root |-- key: string (nullable = false) |-- attributes: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = false) > schemaRddSt.collect *** Failure *** scala.MatchError: MapType(StringType,StringType,false) (of class org.apache.spark.sql.catalyst.types.MapType) at org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:397) ... The schema of the schemaRDD is correct. But it seems that the json cannot be coerced to a MapType. I can see at the line in the stack trace that there is no case statement for MapType. Is there something I'm missing? Is this a bug or decision to not support MapType with json? Thanks, Brian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/jsonRdd-and-MapType-tp18376.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org