GitHub user goldmedal opened a pull request: https://github.com/apache/spark/pull/19339
[SPARK-22112][PYSPARK] Add an API to create a DataFrame from RDD[String] storing CSV ## What changes were proposed in this pull request? We added a method to the scala API for creating a `DataFrame` from `DataSet[String]` storing CSV in [SPARK-15463](https://issues.apache.org/jira/browse/SPARK-15463) but PySpark doesn't have `Dataset` to support this feature. Therfore, I add an API to create a `DataFrame` from `RDD[String]` storing csv and it's also consistent with PySpark's `spark.read.json`. For example as below ``` >>> rdd = sc.textFile('python/test_support/sql/ages.csv') >>> df2 = spark.read.csv(rdd) >>> df2.dtypes [('_c0', 'string'), ('_c1', 'string')] ``` ## How was this patch tested? add unit test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/goldmedal/spark SPARK-22112 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19339 ---- commit d557892080c8d6ec33dd7a13f4b8cdad88b440b0 Author: goldmedal <liugs...@gmail.com> Date: 2017-09-25T09:31:36Z add csv from `RDD[String]` API and related test case commit baaa93f5e837cdba02922e183a3f81c287e19854 Author: goldmedal <liugs...@gmail.com> Date: 2017-09-25T09:50:34Z fix test case commit d4ef30abdda142a969400c9e6e11a089a5483385 Author: goldmedal <liugs...@gmail.com> Date: 2017-09-25T11:59:08Z finish pyspark dataframe from rdd of csv string commit 9bd4eed474fdfa20d5933558d519fb187694aa33 Author: goldmedal <liugs...@gmail.com> Date: 2017-09-25T12:13:50Z modified comments commit 7525b48d2b9b59b1d6ce74a145fc049cfce6529a Author: goldmedal <liugs...@gmail.com> Date: 2017-09-25T12:14:55Z modified comments ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org