subject:"\[GitHub\] spark pull request\: \[SPARK\-8406\] \[SQL\] Adding UUID to output file ..."

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6864


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114179593
  
LGTM. I am merging it to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114168411
  
@nemccarthy Yeah. This one should be in today. I am taking a final check 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114052709
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114052689
  
  [Test build #35437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35437/console)
 for   PR 6864 at commit 
[`db7a46a`](https://github.com/apache/spark/commit/db7a46a169d1789ff221d7f84315edc3df04511a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread nemccarthy

Github user nemccarthy commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114047349
  
Can chance this can be merged today? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114034722
  
#6932 was opened to backport this PR to branch-1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114031522
  
  [Test build #35437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35437/consoleFull)
 for   PR 6864 at commit 
[`db7a46a`](https://github.com/apache/spark/commit/db7a46a169d1789ff221d7f84315edc3df04511a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114030426
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114030392
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114021183
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114021161
  
  [Test build #35429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35429/console)
 for   PR 6864 at commit 
[`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32907926
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer(
   assert(writer != null, "OutputWriter instance should have been 
initialized")
   writer.close()
   super.commitTask()
-} catch {
-  case cause: Throwable =>
-super.abortTask()
-throw new RuntimeException("Failed to commit task", cause)
+} catch { case cause: Throwable =>
+  throw new RuntimeException("Failed to commit task", cause)
--- End diff --

Right, it's handled in `writeRows`. Agree with more comments, I made 
multiple mistakes here myself...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-114016526
  
LGTM. Left two comments regarding adding comments/docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32907403
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer(
   assert(writer != null, "OutputWriter instance should have been 
initialized")
   writer.close()
   super.commitTask()
-} catch {
-  case cause: Throwable =>
-super.abortTask()
-throw new RuntimeException("Failed to commit task", cause)
+} catch { case cause: Throwable =>
+  throw new RuntimeException("Failed to commit task", cause)
--- End diff --

Actually, I think we need to also add doc to `InsertIntoHadoopFsRelation` 
to explain the flow of this command and how we handle different kinds of 
failures/errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32907374
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer(
   assert(writer != null, "OutputWriter instance should have been 
initialized")
   writer.close()
   super.commitTask()
-} catch {
-  case cause: Throwable =>
-super.abortTask()
-throw new RuntimeException("Failed to commit task", cause)
+} catch { case cause: Throwable =>
+  throw new RuntimeException("Failed to commit task", cause)
--- End diff --

This exception will be cached in `writeRows`, right? If so, can we add a 
comment and also explain how we will handle this exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113999842
  
  [Test build #35429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35429/consoleFull)
 for   PR 6864 at commit 
[`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113999024
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113998951
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113998876
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113998119
  
**[Test build #35423 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35423/console)**
 for PR 6864 at commit 
[`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb)
 after a configured wait of `175m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113998128
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113980289
  
  [Test build #35423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35423/consoleFull)
 for   PR 6864 at commit 
[`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113980248
  
It's proved that increasing thread number of the local `SparkContext` used 
by `TestHive` (and running the tests on a node with relatively more cores, say 
our Jenkins builder) is pretty useful for detecting concurrency related bugs.  
SPARK-8501 and SPARK-8513 are both detected by this means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113980134
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113980139
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113974351
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113974339
  
  [Test build #35417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35417/console)
 for   PR 6864 at commit 
[`3207323`](https://github.com/apache/spark/commit/32073239d8a03c6c7404c9dab7555c0cae1b5455).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113971822
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113971804
  
  [Test build #35416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35416/console)
 for   PR 6864 at commit 
[`2738368`](https://github.com/apache/spark/commit/2738368eb04c256a4c28a802b5a3b12a5458fda8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113971004
  
  [Test build #950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/950/console)
 for   PR 6864 at commit 
[`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32899080
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -156,6 +156,7 @@ class CommitFailureTestRelation(
 context: TaskAttemptContext): OutputWriter = {
   new SimpleTextOutputWriter(path, context) {
 override def close(): Unit = {
+  super.close()
--- End diff --

I decided to leave it there. The writer should be closed anyway. Otherwise 
it's leaked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113966066
  
  [Test build #35417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35417/consoleFull)
 for   PR 6864 at commit 
[`3207323`](https://github.com/apache/spark/commit/32073239d8a03c6c7404c9dab7555c0cae1b5455).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113965968
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113965962
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898807
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -156,6 +156,7 @@ class CommitFailureTestRelation(
 context: TaskAttemptContext): OutputWriter = {
   new SimpleTextOutputWriter(path, context) {
 override def close(): Unit = {
+  super.close()
--- End diff --

I was thinking about S3, where a file is not actually created before the 
output stream is closed (the `PUT` operation happens in 
`NativeS3FsOutputStream.close()`). But `SimpleTextRelation` is only used for 
local testing, so yeah, this line is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898686
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala ---
@@ -44,7 +44,7 @@ abstract class OrcSuite extends QueryTest with 
BeforeAndAfterAll {
 import org.apache.spark.sql.hive.test.TestHive.implicits._
 
 sparkContext
-  .makeRDD(1 to 10)
+  .makeRDD(1 to 100)
--- End diff --

The JIRA is already there: https://issues.apache.org/jira/browse/SPARK-8501

Adding the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113964832
  
  [Test build #35416 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35416/consoleFull)
 for   PR 6864 at commit 
[`2738368`](https://github.com/apache/spark/commit/2738368eb04c256a4c28a802b5a3b12a5458fda8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898677
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -49,7 +49,7 @@ import scala.collection.JavaConversions._
 object TestHive
   extends TestHiveContext(
 new SparkContext(
-  System.getProperty("spark.sql.test.master", "local[2]"),
+  System.getProperty("spark.sql.test.master", "local[32]"),
--- End diff --

I think we'd better use a fixed number here to improve determinism (if we 
use 32 from the beginning, the ORC bug would be much easier to reproduce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113964744
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113964742
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113964712
  
btw, do we need a PR for 1.4 backport?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113963997
  
  [Test build #950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/950/consoleFull)
 for   PR 6864 at commit 
[`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898467
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -290,6 +298,9 @@ private[sql] abstract class BaseWriterContainer(
 setupIDs(0, 0, 0)
 setupConf()
 
+ContextUtil.getConfiguration(job).set(
+  "spark.sql.sources.writeJobUUID", uniqueWriteJobId.toString)
--- End diff --

We need to add comment to explain how this UUID get sent to executor side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898397
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -156,6 +156,7 @@ class CommitFailureTestRelation(
 context: TaskAttemptContext): OutputWriter = {
   new SimpleTextOutputWriter(path, context) {
 override def close(): Unit = {
+  super.close()
--- End diff --

Do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898395
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala ---
@@ -44,7 +44,7 @@ abstract class OrcSuite extends QueryTest with 
BeforeAndAfterAll {
 import org.apache.spark.sql.hive.test.TestHive.implicits._
 
 sparkContext
-  .makeRDD(1 to 10)
+  .makeRDD(1 to 100)
--- End diff --

Can we add a comment in this suite to document what's the problem we got 
(also create a jir)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-21 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32898333
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -49,7 +49,7 @@ import scala.collection.JavaConversions._
 object TestHive
   extends TestHiveContext(
 new SparkContext(
-  System.getProperty("spark.sql.test.master", "local[2]"),
+  System.getProperty("spark.sql.test.master", "local[32]"),
--- End diff --

Maybe we should still use `local[*]?`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113735490
  
  [Test build #35361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35361/console)
 for   PR 6864 at commit 
[`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113735514
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113730049
  
  [Test build #35361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35361/consoleFull)
 for   PR 6864 at commit 
[`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113730026
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113730019
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113729847
  
With the help from @yhuai, finally found the root cause of the 
`OrcSourceSuite` failures showed in previous Jenkins builds. [SPARK-8501] [1] 
is opened to track that issue.

The reason why it shows in this PR and couldn't be reproduced locally on my 
laptop is that I changed the thread count number of the local `SparkContext` 
used by `TestHiveContext` to `*`, which uses 32 cores on Jenkins and 8 cores on 
my laptop. On the other hand, the testing data used in `OrcSourceSuite` 
consists of 10 rows, which means the ORC table written on my laptop consists of 
8 part-files and each one contains some rows, while the one written on Jenkins 
consists of 32 part-files and some of them contains zero rows. It turned out 
that those empty ORC files messed things up. Please refer to [SPARK-8501] [1] 
for details.

For this reason, I made two more updates:

1. Change `local[*]` to `local[32]` for more determinism. 32 is chosen 
because Jenkins has 32 cores, and it should be enough for detecting concurrency 
issues.
2. Increased row number of the testing data used in `OrcSourceSuite` to 100 
to temporarily workaround the build failure. SPARK-8501 will be fixed in 
another PR.

[1]: https://issues.apache.org/jira/browse/SPARK-8501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-20 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113728195
  
@lianhuiwang Yeah, thanks for reminding. We are also working on this issue. 
It will be addressed in another PR. At first, appending jobs with output 
committers like `DirectParquetOutputCommitter` can be tricky to handle since 
they writes directly to the target directory without using any temporary folder 
(this can be super useful for S3 since S3 file metadata operations and 
directory operations can be very slow). But with this PR, the job level UUID 
can be used to distinguish files written by different jobs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread lianhuiwang

Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113701854
  
thanks @liancheng. @chenghao-intel there is other situation that needs to 
be considered when using data source interface.when some tasks are finished but 
job is failed because some tasks are failed, it needs to remove all output 
files of this job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113696384
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113696377
  
  [Test build #35344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35344/console)
 for   PR 6864 at commit 
[`d5698b2`](https://github.com/apache/spark/commit/d5698b216c32a77633fa58d356ea2155659f68ba).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113690721
  
  [Test build #35344 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35344/consoleFull)
 for   PR 6864 at commit 
[`d5698b2`](https://github.com/apache/spark/commit/d5698b216c32a77633fa58d356ea2155659f68ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113690652
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113690646
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113641248
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113641157
  
  [Test build #35313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35313/console)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113625705
  
  [Test build #35313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35313/consoleFull)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113624005
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113624042
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113623626
  
Retesting for gaining more test failure logs to diagnose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113623447
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113581608
  
  [Test build #936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/936/console)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113568792
  
  [Test build #936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/936/consoleFull)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32845485
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -290,6 +298,9 @@ private[sql] abstract class BaseWriterContainer(
 setupIDs(0, 0, 0)
 setupConf()
 
+ContextUtil.getConfiguration(job).set(
+  "spark.sql.sources.writeJobUUID", uniqueWriteJobId.toString)
--- End diff --

Why we use a parquet method at here? Are we expecting that 
`getConfiguration` does not exist in some versions of `Job`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32844987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala ---
@@ -70,7 +71,7 @@ private[sql] case class InsertIntoHadoopFsRelation(
   relation.paths.length == 1,
   s"Cannot write to multiple destinations: 
${relation.paths.mkString(",")}")
 
-val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+val hadoopConf = new 
Configuration(sqlContext.sparkContext.hadoopConfiguration)
--- End diff --

Do we need this? We already do `val job = new Job(hadoopConf)` below. BTW, 
we need to add comment to explain `new Job` will clone the conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113475454
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113475413
  
  [Test build #35259 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35259/console)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113462120
  
@yhuai Updated PR description with an updated version of the summary 
commented above.  This is ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113460593
  
  [Test build #35259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35259/consoleFull)
 for   PR 6864 at commit 
[`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113460011
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113459995
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449949
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449937
  
  [Test build #35258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35258/console)
 for   PR 6864 at commit 
[`6d946bd`](https://github.com/apache/spark/commit/6d946bd5d3d4bc5b701d5ffa83e9d0934603faef).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449649
  
@chenghao-intel Thanks for the comment! Speculation is a great point that I 
didn't notice. Updated this PR and now use a job level UUID instead of a task 
level one. Because essentially, what we want is to avoid name collision between 
different write jobs (potentially issued by different Spark applications). 
Within a single write job, we can always avoid name collision with the help of 
task ID.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449522
  
  [Test build #35258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35258/consoleFull)
 for   PR 6864 at commit 
[`6d946bd`](https://github.com/apache/spark/commit/6d946bd5d3d4bc5b701d5ffa83e9d0934603faef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449385
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113449360
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113042193
  
Thank you @liancheng for the summary, which is clear for me who didn't dive 
into this part before. One thing that I am think about when I review the code 
#6833, how to remove the redundant files when user switch on the `speculative` 
in writing data via data source interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113024897
  
Some background and a summary of offline discussion with @yhuai about this 
issue:

In 1.4.0, we added `HadoopFsRelation` to abstract partition support of all 
data sources that are based on Hadoop `FileSystem` interface.  Specifically, 
this makes partition discovery, partition pruning, and writing dynamic 
partitions for data sources much easier.  From users' perspective, what the 
write path does is very similar to Hive.  However, they differ a lot internally.

When data are inserted into Hive tables via Spark SQL, 
`InsertIntoHiveTable` simulates Hive's behaviors:

1.  Write data to a temporary location
2.  Commit the write job
3.  Move data in the temporary location to the final destination location 
using

-   `Hive.loadTable()` for non-partitioned table
-   `Hive.loadPartition()` for static partitions
-   `Hive.loadDynamicPartitions()` for dynamic partitions

The important part is that, for appending data to existing tables in step 
3, `Hive.copyFiles()` is invoked to move the data (I found the name is kinda 
confusing since no "copying" occurs here, we are just moving and renaming 
stuff).  If a file in the source directory and another file in the destination 
directory happen to have the same name, say `part-r-1.parquet`, the former 
is moved to the destination directory and renamed with a `_copy_N` postfix 
(`part-r-1_copy_1.parquet`).  That's how Hive avoids name collision.

Some alternatives fixes considered:

1.  Use similar approach as Hive

This approach is not preferred in Spark 1.4.0 mainly because file 
metadata operations in S3 tend to be slow, especially for tables with lots of 
file and/or partitions.  That's why `InsertIntoHadoopFsRelation` just inserts 
to destination directory directly, and is often used together with 
`DirectParquetOutputCommitter` to reduce latency when working with S3.  This 
means, we don't have the chance to do renaming, and must avoid name collision 
from the beginning.

2.  Same as 1.3, just move max part number detection back to driver side

This isn't doable because unlike 1.3, 1.4 also takes dynamic 
partitioning into account.  When inserting into dynamic partitions, we don't 
know which partition directories will be touched on driver side before issuing 
the write job.  Checking all partition directories is simply too expensive for 
tables with thousands of partitions.

3.  Add extra component to output file names to avoid name collision

This seems to be the only reasonable solution for now.

Currently, the ORC data source adds `System.currentTimeMillis` to the 
output file name.  This is not 100% safe, but only fails when two tasks with 
the same task ID (which implies they belong to two separate concurrent jobs) 
are writing to the same location within a same millisecond, which is relatively 
unlikely to happen.  The benefit of using a time stamp here is that, record 
order can be preserved.

Another quite obvious choice is to add a UUID to the output file name.  
Obviously, the benefit is this practically avoids name collision. The drawback 
is that record order is not preserved any more.

However, we never promise to preserve record order when writing data, 
and Hive doesn't promise this either (the `_copy_N` trick breaks record order).

To sum up, adding a UUID to the output file name seems to be the simplest 
and safest way to fix this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113020276
  
  [Test build #35077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35077/console)
 for   PR 6864 at commit 
[`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113020322
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113010743
  
OK, found out that those integers are printed by `SQLQuerySuite.test script 
transform for stderr`. See [Josh's comment] [1].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113009538
  
  [Test build #35077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35077/consoleFull)
 for   PR 6864 at commit 
[`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113009510
  
The last build failure looks pretty weird: a large part of Jenkins build 
log output are replaced by tens of thousands of lines of integer triples, and 
none of the 5 test failure can be reproduced locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113009233
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113009265
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-113009144
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6864#discussion_r32690469
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -470,6 +470,33 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
   checkAnswer(sqlContext.table("t"), df.select('b, 'c, 'a).collect())
 }
   }
+
+  // NOTE: This test suite is not super deterministic.  On nodes with only 
relatively few cores
+  // (4 or even 1), it's hard to reproduce the data loss issue.  But on 
nodes with for example 8 or
+  // more cores, the issue can be reproduced steadily.  Fortunately our 
Jenkins builder meets this
+  // requirement.  We probably want to move this test case to 
spark-integration-tests or spark-perf
+  // later.
+  test("SPARK-8406") {
--- End diff --

Can you add a description in addition to the JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112981099
  
  [Test build #35066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35066/console)
 for   PR 6864 at commit 
[`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112981114
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112973920
  
  [Test build #35066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35066/consoleFull)
 for   PR 6864 at commit 
[`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112973514
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112973535
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...

2015-06-17 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/6864#issuecomment-112973573
  
Background and alternative solutions for this issue can be a little bit 
complex. Will give a summary of offline discussion with @yhuai here later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 101 matches

Mail list logo