GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/15707
[SPARK-18024][SQL] Introduce an internal commit protocol API - rebased ## What changes were proposed in this pull request? This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits. ## How was this patch tested? Should be covered by existing write tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-18024-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15707 ---- commit 8c4ae5eb7441fd5bc0b06276d5d02a2ebc6de4a0 Author: Eric Liang <e...@databricks.com> Date: 2016-10-27T21:45:52Z Thu Oct 27 14:45:52 PDT 2016 commit 2484809e1735a7c3fc875f09c68c12d2cd99dd62 Author: Eric Liang <e...@databricks.com> Date: 2016-10-28T00:53:13Z Thu Oct 27 17:53:13 PDT 2016 commit 4d967251ce01794f7cdab9f84b70fa5393d1d1f2 Author: Eric Liang <e...@databricks.com> Date: 2016-10-28T00:53:30Z Thu Oct 27 17:53:29 PDT 2016 commit 72c4294bb401ff3795363d3c0bb436bb56844630 Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T17:56:49Z WIP - commit API commit 2a613516dd469bca5ed4d7b0f17f678e9e70e267 Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T17:57:18Z Add commit protocol itself commit 6af14b56590a0882800f62a2a2b939ee3715edbb Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T20:46:35Z Move output committer instantiation into MapReduceFileCommitterProtocol. commit 6166093d511e833587d32e398338e2f47ccbcc8a Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T20:50:13Z Specify that implementations must be serializable. commit 040bbba0bdbd647f963b7a61e18b69fd62565201 Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T22:16:05Z Specify path commit 51d0919577c71155adb7d4737e9441cede8fe97d Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T22:36:46Z Add documentation. commit 2d7d373fe48d18037653c10424c8b1c978160958 Author: Reynold Xin <r...@databricks.com> Date: 2016-10-31T22:43:54Z Make MapReduceFileCommitterProtocol serializable. commit cd23d2f7bdf7a3ef9b93e77a3ae540d553398267 Author: Reynold Xin <r...@databricks.com> Date: 2016-11-01T00:34:31Z Make protocol configurable. commit 0647959cbbbaaf5fb5cfe31515c2598f99ee180f Author: Reynold Xin <r...@databricks.com> Date: 2016-11-01T00:58:23Z Merge pull request #15633 from ericl/spark-18087 [SPARK-18087] [SQL] Optimize insert to not require REPAIR TABLE ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org