GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/15769
[SPARK-18191][CORE] Port RDD API to use commit protocol ## What changes were proposed in this pull request? This PR port RDD API to use commit protocol, the changes made here: 1. Add new internal helper class that saves an RDD using a Hadoop OutputFormat named `SparkNewHadoopWriter`, it's similar with `SparkHadoopWriter` but uses commit protocol. This class supports the newer `mapreduce` API, instead of the old `mapred` API which is supported by `SparkHadoopWriter`; 2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it uses commit protocol now. ## How was this patch tested? Exsiting test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark rdd-commit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15769.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15769 ---- commit a0426c8d5fdac5bb59faff3681c1180b24cb314e Author: jiangxingbo <jiangxb1...@gmail.com> Date: 2016-11-04T14:27:17Z port RDD API to use commit protocol. commit e017e1e501429f55ebdbbf98f8aaab8f53902a40 Author: jiangxingbo <jiangxb1...@gmail.com> Date: 2016-11-04T14:36:28Z update comment. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org