GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21550

    [SPARK-24543][SQL] Support any type as DDL string for from_json's schema

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to support any DataType represented as DDL string for 
the from_json function. After the changes, it will be possible to specify 
`MapType` in SQL like:
    ```sql
    select from_json('{"a":1, "b":2}', 'map<string, int>')
    ```
    and in Scala (similar in other languages)
    ```scala
    val in = Seq("""{"a": {"b": 1}}""").toDS()
    val schema = "map<string, map<string, int>>"
    val out = in.select(from_json($"value", schema, Map.empty[String, String]))
    ```
    
    ## How was this patch tested?
    
    Added a couple sql tests and modified existing tests for Python and Scala. 
The former tests were modified because it is not imported for them in which 
format schema for `from_json` is provided.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 from_json-ddl-schema

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21550
    
----
commit 41d4522848610d3c8c7983157f0b4b7bded9dd94
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-13T05:56:33Z

    Support any types in schema DDL

commit f824f1651999f0ba8919d4b8d29329eb1f538237
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-13T05:56:57Z

    SQL tests for from_json

commit 08a01223354cf44174653996dae936aa09bf340d
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-13T06:47:46Z

    Support any DataType as schema for from_json

commit 41ad77ee74265a170191203bf0330a7c7b3b384d
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-13T06:48:40Z

    Test for MapType in PySpark's from_json

commit 5d53ec77f022a17a1ffb5c77937a32b3a32cea63
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-13T06:53:35Z

    Test for MapType in DDL as the root type for from_json

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to