GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22939

    [SPARK-25446][R] Add schema_of_json() and schema_of_csv() to R

    ## What changes were proposed in this pull request?
    
    This PR proposes to expose `schema_of_json` and `schema_of_csv` at R side.
    
    **`schema_of_json`**:
    
    ```r
    > json <- '{"name":"Bob"}'
    > df <- sql("SELECT * FROM range(1)")
    > head(select(df, schema_of_json(json)))
      schema_of_json({"name":"Bob"})
    1            struct<name:string>
    ```
    
    **`schema_of_csv`**:
    
    ```r
    > csv <- "Amsterdam,2018"
    > df <- sql("SELECT * FROM range(1)")
    > head(select(df, schema_of_csv(csv)))
      schema_of_csv(Amsterdam,2018)
    1    struct<_c0:string,_c1:int>
    ```
    
    This is useful when it's used with [to|from]_[csv|json]:
    
    ```r
    > df <- sql("SELECT named_struct('name', 'Bob') as people")
    > df <- mutate(df, people_json = to_json(df$people))
    > head(select(df, from_json(df$people_json, 
schema_of_json(head(df)$people_json))))
      from_json(people_json)
    1                    Bob
    ```
    
    ```r
    > df <- sql("SELECT named_struct('name', 'Bob') as people")
    > df <- mutate(df, people_json = to_csv(df$people))
    > head(select(df, from_csv(df$people_json, 
schema_of_csv(head(df)$people_json))))
      from_csv(people_json)
    1                   Bob
    ```
    
    ## How was this patch tested?
    
    Manually tested, unit tests added, documentation manually built and 
verified.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-25446

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22939
    
----
commit c4a78fc0b14876a857bdd9b2f8f094744dd76c04
Author: hyukjinkwon <gurwls223@...>
Date:   2018-11-04T08:46:20Z

    Add schema_of_json() and schema_of_csv() to R

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to