GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22954
[DO-NOT-MERGE][POC] Enables Arrow optimization from R DataFrame to Spark DataFrame ## What changes were proposed in this pull request? This PR is not for merging it but targets to demonstrates the feasibility (with reusing PyArrow code path at its best) and performance improvement when converting R dataframes to Spark's dataframe. This can be tested as below: ```bash $ ./bin/sparkR --conf spark.sql.execution.arrow.enabled=true ``` ```r collect(createDataFrame(mtcars)) ``` **Requirements:** - R 3.5.x - Arrow package 0.12+ (not released yet) - CRAN released (ARROW-3204) - withr package **TODOs:** - [ ] Performance measurement - [ ] TDB ## How was this patch tested? Small test was added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark r-arrow-createdataframe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22954 ---- commit 90011a5ff48f2c5fa5fae0e2573fcdaa85d44976 Author: hyukjinkwon <gurwls223@...> Date: 2018-11-06T02:38:37Z [POC] Enables Arrow optimization from R DataFrame to Spark DataFrame ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org