Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Matteo Cossu Sat, 07 Oct 2017 08:29:17 -0700

Hello,
I think you should use *from_json *from spark.sql.functions
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#from_json-org.apache.spark.sql.Column-org.apache.spark.sql.types.DataType->
to parse the json string and convert it to a StructType. Afterwards, you
can create a new DataSet by selecting the columns you want.


On 7 October 2017 at 09:30, kant kodali <kanth...@gmail.com> wrote:

> I have a Dataset<String> ds which consists of json rows.
>
> *Sample Json Row (This is just an example of one row in the dataset)*
>
> [
>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, 
> "docs":[{"subject": "english", "year": 2016}]}
>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, 
> "docs":[{"subject": "math", "year": 2017}]}
>
> ]
>
> ds.printSchema()
>
> root
>  |-- value: string (nullable = true)
>
> Now I want to convert into the following dataset using Spark 2.2.0
>
> name  |             address               |  docs
> ----------------------------------------------------------------------------------
> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 
> 2016}]
> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 
> 2017}]
>
> Preferably Java but Scala is also fine as long as there are functions
> available in Java API
>

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Reply via email to