Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Jules Damji Sat, 07 Oct 2017 10:01:51 -0700

You might find these blogs helpful to parse & extract data from complex 
structures:


https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

https://databricks.com/blog/2017/06/13/five-spark-sql-utility-functions-extract-explore-complex-data-types.html

Cheers 
Jules


Sent from my iPhone
Pardon the dumb thumb typos :)

> On Oct 7, 2017, at 12:30 AM, kant kodali <kanth...@gmail.com> wrote:
> 
> I have a Dataset<String> ds which consists of json rows.
> 
> Sample Json Row (This is just an example of one row in the dataset)
> 
> [ 
>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, 
> "docs":[{"subject": "english", "year": 2016}]}
>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, 
> "docs":[{"subject": "math", "year": 2017}]}
> 
> ]
> ds.printSchema()
> 
> root
>  |-- value: string (nullable = true)
> Now I want to convert into the following dataset using Spark 2.2.0
> 
> name  |             address               |  docs 
> ----------------------------------------------------------------------------------
> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 
> 2016}]
> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 
> 2017}]
> Preferably Java but Scala is also fine as long as there are functions 
> available in Java API

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Reply via email to