GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/15769

    [SPARK-18191][CORE] Port RDD API to use commit protocol

    ## What changes were proposed in this pull request?
    
    This PR port RDD API to use commit protocol, the changes made here:
    1. Add new internal helper class that saves an RDD using a Hadoop 
OutputFormat named `SparkNewHadoopWriter`, it's similar with 
`SparkHadoopWriter` but uses commit protocol. This class supports the newer 
`mapreduce` API, instead of the old `mapred` API which is supported by 
`SparkHadoopWriter`;
    2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it 
uses commit protocol now.
    
    ## How was this patch tested?
    Exsiting test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark rdd-commit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15769
    
----
commit a0426c8d5fdac5bb59faff3681c1180b24cb314e
Author: jiangxingbo <jiangxb1...@gmail.com>
Date:   2016-11-04T14:27:17Z

    port RDD API to use commit protocol.

commit e017e1e501429f55ebdbbf98f8aaab8f53902a40
Author: jiangxingbo <jiangxb1...@gmail.com>
Date:   2016-11-04T14:36:28Z

    update comment.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to