You might find these blogs helpful to parse & extract data from complex structures:
https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html https://databricks.com/blog/2017/06/13/five-spark-sql-utility-functions-extract-explore-complex-data-types.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Oct 7, 2017, at 12:30 AM, kant kodali <kanth...@gmail.com> wrote: > > I have a Dataset<String> ds which consists of json rows. > > Sample Json Row (This is just an example of one row in the dataset) > > [ > {"name": "foo", "address": {"state": "CA", "country": "USA"}, > "docs":[{"subject": "english", "year": 2016}]} > {"name": "bar", "address": {"state": "OH", "country": "USA"}, > "docs":[{"subject": "math", "year": 2017}]} > > ] > ds.printSchema() > > root > |-- value: string (nullable = true) > Now I want to convert into the following dataset using Spark 2.2.0 > > name | address | docs > ---------------------------------------------------------------------------------- > "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": > 2016}] > "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": > 2017}] > Preferably Java but Scala is also fine as long as there are functions > available in Java API