[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15707


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15707#discussion_r85870354
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
 ---
@@ -133,7 +133,7 @@ object WriteOutput extends Logging {
   sparkAttemptNumber = taskContext.attemptNumber(),
   committer,
   iterator = iter)
-  }).flatten.distinct
+  })
--- End diff --

Move the distinct to updatedPartitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15707

[SPARK-18024][SQL] Introduce an internal commit protocol API - rebased

## What changes were proposed in this pull request?
This patch introduces an internal commit protocol API that is used by the 
batch data source to do write commits. It currently has only one implementation 
that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit 
API can be used to unify streaming and batch commits.

## How was this patch tested?
Should be covered by existing write tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-18024-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15707


commit 8c4ae5eb7441fd5bc0b06276d5d02a2ebc6de4a0
Author: Eric Liang 
Date:   2016-10-27T21:45:52Z

Thu Oct 27 14:45:52 PDT 2016

commit 2484809e1735a7c3fc875f09c68c12d2cd99dd62
Author: Eric Liang 
Date:   2016-10-28T00:53:13Z

Thu Oct 27 17:53:13 PDT 2016

commit 4d967251ce01794f7cdab9f84b70fa5393d1d1f2
Author: Eric Liang 
Date:   2016-10-28T00:53:30Z

Thu Oct 27 17:53:29 PDT 2016

commit 72c4294bb401ff3795363d3c0bb436bb56844630
Author: Reynold Xin 
Date:   2016-10-31T17:56:49Z

WIP - commit API

commit 2a613516dd469bca5ed4d7b0f17f678e9e70e267
Author: Reynold Xin 
Date:   2016-10-31T17:57:18Z

Add commit protocol itself

commit 6af14b56590a0882800f62a2a2b939ee3715edbb
Author: Reynold Xin 
Date:   2016-10-31T20:46:35Z

Move output committer instantiation into MapReduceFileCommitterProtocol.

commit 6166093d511e833587d32e398338e2f47ccbcc8a
Author: Reynold Xin 
Date:   2016-10-31T20:50:13Z

Specify that implementations must be serializable.

commit 040bbba0bdbd647f963b7a61e18b69fd62565201
Author: Reynold Xin 
Date:   2016-10-31T22:16:05Z

Specify path

commit 51d0919577c71155adb7d4737e9441cede8fe97d
Author: Reynold Xin 
Date:   2016-10-31T22:36:46Z

Add documentation.

commit 2d7d373fe48d18037653c10424c8b1c978160958
Author: Reynold Xin 
Date:   2016-10-31T22:43:54Z

Make MapReduceFileCommitterProtocol serializable.

commit cd23d2f7bdf7a3ef9b93e77a3ae540d553398267
Author: Reynold Xin 
Date:   2016-11-01T00:34:31Z

Make protocol configurable.

commit 0647959cbbbaaf5fb5cfe31515c2598f99ee180f
Author: Reynold Xin 
Date:   2016-11-01T00:58:23Z

Merge pull request #15633 from ericl/spark-18087

[SPARK-18087] [SQL] Optimize insert to not require REPAIR TABLE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org