Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

kant kodali Sun, 08 Oct 2017 11:59:32 -0700

I have the following so far

private StructType getSchema() {
    return new StructType()
            .add("name", StringType)
            .add("address", StringType)
            .add("docs", StringType);
}



ds.select(explode_outer(from_json(ds.col("value"),
ArrayType.apply(getSchema()))).as("result")).selectExpr("result.*");

This didn't quite work for me so just to clarify I have Json array of
documents as my input string
and I am trying to keep the values of my name, address, docs columns as a
string as well except
my input array string is flattened out by explode function.
Any suggestions will be great
Thanks!


On Sat, Oct 7, 2017 at 10:00 AM, Jules Damji <dmat...@comcast.net> wrote:

> You might find these blogs helpful to parse & extract data from complex
> structures:
>
> https://databricks.com/blog/2017/06/27/4-sql-high-order-
> lambda-functions-examine-complex-structured-data-databricks.html
>
> https://databricks.com/blog/2017/06/13/five-spark-sql-
> utility-functions-extract-explore-complex-data-types.html
>
> Cheers
> Jules
>
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
> On Oct 7, 2017, at 12:30 AM, kant kodali <kanth...@gmail.com> wrote:
>
> I have a Dataset<String> ds which consists of json rows.
>
> *Sample Json Row (This is just an example of one row in the dataset)*
>
> [
>     {"name": "foo", "address": {"state": "CA", "country": "USA"}, 
> "docs":[{"subject": "english", "year": 2016}]}
>     {"name": "bar", "address": {"state": "OH", "country": "USA"}, 
> "docs":[{"subject": "math", "year": 2017}]}
>
> ]
>
> ds.printSchema()
>
> root
>  |-- value: string (nullable = true)
>
> Now I want to convert into the following dataset using Spark 2.2.0
>
> name  |             address               |  docs
> ----------------------------------------------------------------------------------
> "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 
> 2016}]
> "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 
> 2017}]
>
> Preferably Java but Scala is also fine as long as there are functions
> available in Java API
>
>

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Reply via email to