You can do want with lateral view explode, but what seems to be missing is that jsonRDD converts json objects into structs (fixed keys with a fixed order) and fields in a struct are accessed using a `.`
val myJson = sqlContext.jsonRDD(sc.parallelize("""{"foo":[{"bar":1},{"baz":2}]}""" :: Nil)) myJson.registerTempTable("JsonTest")val result = sql("SELECT f.bar FROM JsonTest LATERAL VIEW explode(foo) a AS f").collect() myJson: org.apache.spark.sql.DataFrame = [foo: array<struct<bar:bigint,baz:bigint>>] result: Array[org.apache.spark.sql.Row] = Array([1], [null]) In Spark 1.3 you can also hint to jsonRDD that you'd like the json objects converted into Maps (non-uniform keys) instead of structs, by manually specifying the schema of your JSON. import org.apache.spark.sql.types._val schema = StructType( StructField("foo", ArrayType(MapType(StringType, IntegerType))) :: Nil) sqlContext.jsonRDD(sc.parallelize("""{"foo":[{"bar":1},{"baz":2}]}""" :: Nil), schema).registerTempTable("jsonTest") val withSql = sql("SELECT a FROM jsonTest LATERAL VIEW explode(foo) a AS a WHERE a['bar'] IS NOT NULL").collect() val withSpark = sql("SELECT a FROM jsonTest LATERAL VIEW explode(foo) a AS a").rdd.filter { case Row(a: Map[String, Int]) if a.contains("bar") => true case _: Row => false }.collect() schema: org.apache.spark.sql.types.StructType = StructType(StructField(foo,ArrayType(MapType(StringType,IntegerType,true),true),true)) withSql: Array[org.apache.spark.sql.Row] = Array([Map(bar -> 1)]) withSpark: Array[org.apache.spark.sql.Row] = Array([Map(bar -> 1)]) Michael On Thu, Mar 5, 2015 at 6:11 PM, Justin Pihony <justin.pih...@gmail.com> wrote: > Is there any plans of supporting JSON arrays more fully? Take for example: > > val myJson = > sqlContext.jsonRDD(List("""{"foo":[{"bar":1},{"baz":2}]}""")) > myJson.registerTempTable("JsonTest") > > I would like a way to pull out parts of the array data based on a key > > sql("""SELECT foo["bar"] FROM JsonTest""") //projects only the object > with bar, the rest would be null > > I could even work around this if there was some way to access the key name > from the SchemaRDD: > > myJson.filter(x=>x(0).asInstanceOf[Seq[Row]].exists(y=>y.key == "bar")) > .map(x=>x(0).asInstanceOf[Seq[Row]].filter(y=>y.key == "bar")) > //This does the same as above, except also filtering out those without > a > bar key > > This is the closest suggestion I could find thus far, > < > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView > > > which still does not solve the problem of pulling out the keys. > > I tried with a UDF also, but could not currently make that work either. > > If there isn't anything in the works, then would it be appropriate to > create > a ticket for this? > > Thanks, > Justin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-JSON-array-support-tp21939.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >