Hello, I think you should use *from_json *from spark.sql.functions <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#from_json-org.apache.spark.sql.Column-org.apache.spark.sql.types.DataType-> to parse the json string and convert it to a StructType. Afterwards, you can create a new DataSet by selecting the columns you want.
On 7 October 2017 at 09:30, kant kodali <kanth...@gmail.com> wrote: > I have a Dataset<String> ds which consists of json rows. > > *Sample Json Row (This is just an example of one row in the dataset)* > > [ > {"name": "foo", "address": {"state": "CA", "country": "USA"}, > "docs":[{"subject": "english", "year": 2016}]} > {"name": "bar", "address": {"state": "OH", "country": "USA"}, > "docs":[{"subject": "math", "year": 2017}]} > > ] > > ds.printSchema() > > root > |-- value: string (nullable = true) > > Now I want to convert into the following dataset using Spark 2.2.0 > > name | address | docs > ---------------------------------------------------------------------------------- > "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": > 2016}] > "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": > 2017}] > > Preferably Java but Scala is also fine as long as there are functions > available in Java API >