I have the following so far private StructType getSchema() { return new StructType() .add("name", StringType) .add("address", StringType) .add("docs", StringType); }
ds.select(explode_outer(from_json(ds.col("value"), ArrayType.apply(getSchema()))).as("result")).selectExpr("result.*"); This didn't quite work for me so just to clarify I have Json array of documents as my input string and I am trying to keep the values of my name, address, docs columns as a string as well except my input array string is flattened out by explode function. Any suggestions will be great Thanks! On Sat, Oct 7, 2017 at 10:00 AM, Jules Damji <dmat...@comcast.net> wrote: > You might find these blogs helpful to parse & extract data from complex > structures: > > https://databricks.com/blog/2017/06/27/4-sql-high-order- > lambda-functions-examine-complex-structured-data-databricks.html > > https://databricks.com/blog/2017/06/13/five-spark-sql- > utility-functions-extract-explore-complex-data-types.html > > Cheers > Jules > > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Oct 7, 2017, at 12:30 AM, kant kodali <kanth...@gmail.com> wrote: > > I have a Dataset<String> ds which consists of json rows. > > *Sample Json Row (This is just an example of one row in the dataset)* > > [ > {"name": "foo", "address": {"state": "CA", "country": "USA"}, > "docs":[{"subject": "english", "year": 2016}]} > {"name": "bar", "address": {"state": "OH", "country": "USA"}, > "docs":[{"subject": "math", "year": 2017}]} > > ] > > ds.printSchema() > > root > |-- value: string (nullable = true) > > Now I want to convert into the following dataset using Spark 2.2.0 > > name | address | docs > ---------------------------------------------------------------------------------- > "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": > 2016}] > "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": > 2017}] > > Preferably Java but Scala is also fine as long as there are functions > available in Java API > >