GitHub user goldmedal opened a pull request:

    https://github.com/apache/spark/pull/19339

    [SPARK-22112][PYSPARK] Add an API to create a DataFrame from RDD[String] 
storing CSV

    ## What changes were proposed in this pull request?
    We added a method to the scala API for creating a `DataFrame` from 
`DataSet[String]` storing CSV in 
[SPARK-15463](https://issues.apache.org/jira/browse/SPARK-15463) but PySpark 
doesn't have `Dataset` to support this feature. Therfore, I add an API to 
create a `DataFrame` from `RDD[String]` storing csv and it's also consistent 
with PySpark's `spark.read.json`.
    
    For example as below
    ```
    >>> rdd = sc.textFile('python/test_support/sql/ages.csv')
    >>> df2 = spark.read.csv(rdd)
    >>> df2.dtypes
    [('_c0', 'string'), ('_c1', 'string')]
    ```
    ## How was this patch tested?
    add unit test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/goldmedal/spark SPARK-22112

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19339
    
----
commit d557892080c8d6ec33dd7a13f4b8cdad88b440b0
Author: goldmedal <liugs...@gmail.com>
Date:   2017-09-25T09:31:36Z

    add csv from `RDD[String]` API and related test case

commit baaa93f5e837cdba02922e183a3f81c287e19854
Author: goldmedal <liugs...@gmail.com>
Date:   2017-09-25T09:50:34Z

    fix test case

commit d4ef30abdda142a969400c9e6e11a089a5483385
Author: goldmedal <liugs...@gmail.com>
Date:   2017-09-25T11:59:08Z

    finish pyspark dataframe from rdd of csv string

commit 9bd4eed474fdfa20d5933558d519fb187694aa33
Author: goldmedal <liugs...@gmail.com>
Date:   2017-09-25T12:13:50Z

    modified comments

commit 7525b48d2b9b59b1d6ce74a145fc049cfce6529a
Author: goldmedal <liugs...@gmail.com>
Date:   2017-09-25T12:14:55Z

    modified comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to