[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6864 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114179593 LGTM. I am merging it to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114168411 @nemccarthy Yeah. This one should be in today. I am taking a final check now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114052709 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114052689 [Test build #35437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35437/console) for PR 6864 at commit [`db7a46a`](https://github.com/apache/spark/commit/db7a46a169d1789ff221d7f84315edc3df04511a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user nemccarthy commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114047349 Can chance this can be merged today? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114034722 #6932 was opened to backport this PR to branch-1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114031522 [Test build #35437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35437/consoleFull) for PR 6864 at commit [`db7a46a`](https://github.com/apache/spark/commit/db7a46a169d1789ff221d7f84315edc3df04511a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114030426 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114030392 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114021183 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114021161 [Test build #35429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35429/console) for PR 6864 at commit [`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32907926 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer( assert(writer != null, "OutputWriter instance should have been initialized") writer.close() super.commitTask() -} catch { - case cause: Throwable => -super.abortTask() -throw new RuntimeException("Failed to commit task", cause) +} catch { case cause: Throwable => + throw new RuntimeException("Failed to commit task", cause) --- End diff -- Right, it's handled in `writeRows`. Agree with more comments, I made multiple mistakes here myself... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-114016526 LGTM. Left two comments regarding adding comments/docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32907403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer( assert(writer != null, "OutputWriter instance should have been initialized") writer.close() super.commitTask() -} catch { - case cause: Throwable => -super.abortTask() -throw new RuntimeException("Failed to commit task", cause) +} catch { case cause: Throwable => + throw new RuntimeException("Failed to commit task", cause) --- End diff -- Actually, I think we need to also add doc to `InsertIntoHadoopFsRelation` to explain the flow of this command and how we handle different kinds of failures/errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32907374 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -417,15 +428,14 @@ private[sql] class DefaultWriterContainer( assert(writer != null, "OutputWriter instance should have been initialized") writer.close() super.commitTask() -} catch { - case cause: Throwable => -super.abortTask() -throw new RuntimeException("Failed to commit task", cause) +} catch { case cause: Throwable => + throw new RuntimeException("Failed to commit task", cause) --- End diff -- This exception will be cached in `writeRows`, right? If so, can we add a comment and also explain how we will handle this exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113999842 [Test build #35429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35429/consoleFull) for PR 6864 at commit [`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113999024 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113998951 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113998876 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113998119 **[Test build #35423 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35423/console)** for PR 6864 at commit [`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb) after a configured wait of `175m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113998128 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113980289 [Test build #35423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35423/consoleFull) for PR 6864 at commit [`99a73ab`](https://github.com/apache/spark/commit/99a73ab959ca95b41d6a69af8a48fea260d640fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113980248 It's proved that increasing thread number of the local `SparkContext` used by `TestHive` (and running the tests on a node with relatively more cores, say our Jenkins builder) is pretty useful for detecting concurrency related bugs. SPARK-8501 and SPARK-8513 are both detected by this means. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113980134 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113980139 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113974351 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113974339 [Test build #35417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35417/console) for PR 6864 at commit [`3207323`](https://github.com/apache/spark/commit/32073239d8a03c6c7404c9dab7555c0cae1b5455). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113971822 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113971804 [Test build #35416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35416/console) for PR 6864 at commit [`2738368`](https://github.com/apache/spark/commit/2738368eb04c256a4c28a802b5a3b12a5458fda8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113971004 [Test build #950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/950/console) for PR 6864 at commit [`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32899080 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -156,6 +156,7 @@ class CommitFailureTestRelation( context: TaskAttemptContext): OutputWriter = { new SimpleTextOutputWriter(path, context) { override def close(): Unit = { + super.close() --- End diff -- I decided to leave it there. The writer should be closed anyway. Otherwise it's leaked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113966066 [Test build #35417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35417/consoleFull) for PR 6864 at commit [`3207323`](https://github.com/apache/spark/commit/32073239d8a03c6c7404c9dab7555c0cae1b5455). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113965968 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113965962 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898807 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -156,6 +156,7 @@ class CommitFailureTestRelation( context: TaskAttemptContext): OutputWriter = { new SimpleTextOutputWriter(path, context) { override def close(): Unit = { + super.close() --- End diff -- I was thinking about S3, where a file is not actually created before the output stream is closed (the `PUT` operation happens in `NativeS3FsOutputStream.close()`). But `SimpleTextRelation` is only used for local testing, so yeah, this line is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898686 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala --- @@ -44,7 +44,7 @@ abstract class OrcSuite extends QueryTest with BeforeAndAfterAll { import org.apache.spark.sql.hive.test.TestHive.implicits._ sparkContext - .makeRDD(1 to 10) + .makeRDD(1 to 100) --- End diff -- The JIRA is already there: https://issues.apache.org/jira/browse/SPARK-8501 Adding the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113964832 [Test build #35416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35416/consoleFull) for PR 6864 at commit [`2738368`](https://github.com/apache/spark/commit/2738368eb04c256a4c28a802b5a3b12a5458fda8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898677 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -49,7 +49,7 @@ import scala.collection.JavaConversions._ object TestHive extends TestHiveContext( new SparkContext( - System.getProperty("spark.sql.test.master", "local[2]"), + System.getProperty("spark.sql.test.master", "local[32]"), --- End diff -- I think we'd better use a fixed number here to improve determinism (if we use 32 from the beginning, the ORC bug would be much easier to reproduce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113964744 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113964742 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113964712 btw, do we need a PR for 1.4 backport? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113963997 [Test build #950 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/950/consoleFull) for PR 6864 at commit [`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898467 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -290,6 +298,9 @@ private[sql] abstract class BaseWriterContainer( setupIDs(0, 0, 0) setupConf() +ContextUtil.getConfiguration(job).set( + "spark.sql.sources.writeJobUUID", uniqueWriteJobId.toString) --- End diff -- We need to add comment to explain how this UUID get sent to executor side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898397 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -156,6 +156,7 @@ class CommitFailureTestRelation( context: TaskAttemptContext): OutputWriter = { new SimpleTextOutputWriter(path, context) { override def close(): Unit = { + super.close() --- End diff -- Do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898395 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala --- @@ -44,7 +44,7 @@ abstract class OrcSuite extends QueryTest with BeforeAndAfterAll { import org.apache.spark.sql.hive.test.TestHive.implicits._ sparkContext - .makeRDD(1 to 10) + .makeRDD(1 to 100) --- End diff -- Can we add a comment in this suite to document what's the problem we got (also create a jir)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32898333 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -49,7 +49,7 @@ import scala.collection.JavaConversions._ object TestHive extends TestHiveContext( new SparkContext( - System.getProperty("spark.sql.test.master", "local[2]"), + System.getProperty("spark.sql.test.master", "local[32]"), --- End diff -- Maybe we should still use `local[*]?` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113735490 [Test build #35361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35361/console) for PR 6864 at commit [`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113735514 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113730049 [Test build #35361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35361/consoleFull) for PR 6864 at commit [`d412de7`](https://github.com/apache/spark/commit/d412de721b7d22a4f118e8b17dbc0faf5590e44f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113730026 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113730019 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113729847 With the help from @yhuai, finally found the root cause of the `OrcSourceSuite` failures showed in previous Jenkins builds. [SPARK-8501] [1] is opened to track that issue. The reason why it shows in this PR and couldn't be reproduced locally on my laptop is that I changed the thread count number of the local `SparkContext` used by `TestHiveContext` to `*`, which uses 32 cores on Jenkins and 8 cores on my laptop. On the other hand, the testing data used in `OrcSourceSuite` consists of 10 rows, which means the ORC table written on my laptop consists of 8 part-files and each one contains some rows, while the one written on Jenkins consists of 32 part-files and some of them contains zero rows. It turned out that those empty ORC files messed things up. Please refer to [SPARK-8501] [1] for details. For this reason, I made two more updates: 1. Change `local[*]` to `local[32]` for more determinism. 32 is chosen because Jenkins has 32 cores, and it should be enough for detecting concurrency issues. 2. Increased row number of the testing data used in `OrcSourceSuite` to 100 to temporarily workaround the build failure. SPARK-8501 will be fixed in another PR. [1]: https://issues.apache.org/jira/browse/SPARK-8501 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113728195 @lianhuiwang Yeah, thanks for reminding. We are also working on this issue. It will be addressed in another PR. At first, appending jobs with output committers like `DirectParquetOutputCommitter` can be tricky to handle since they writes directly to the target directory without using any temporary folder (this can be super useful for S3 since S3 file metadata operations and directory operations can be very slow). But with this PR, the job level UUID can be used to distinguish files written by different jobs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113701854 thanks @liancheng. @chenghao-intel there is other situation that needs to be considered when using data source interface.when some tasks are finished but job is failed because some tasks are failed, it needs to remove all output files of this job. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113696384 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113696377 [Test build #35344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35344/console) for PR 6864 at commit [`d5698b2`](https://github.com/apache/spark/commit/d5698b216c32a77633fa58d356ea2155659f68ba). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113690721 [Test build #35344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35344/consoleFull) for PR 6864 at commit [`d5698b2`](https://github.com/apache/spark/commit/d5698b216c32a77633fa58d356ea2155659f68ba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113690652 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113690646 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113641248 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113641157 [Test build #35313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35313/console) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113625705 [Test build #35313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35313/consoleFull) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113624005 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113624042 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113623626 Retesting for gaining more test failure logs to diagnose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113623447 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113581608 [Test build #936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/936/console) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113568792 [Test build #936 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/936/consoleFull) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32845485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -290,6 +298,9 @@ private[sql] abstract class BaseWriterContainer( setupIDs(0, 0, 0) setupConf() +ContextUtil.getConfiguration(job).set( + "spark.sql.sources.writeJobUUID", uniqueWriteJobId.toString) --- End diff -- Why we use a parquet method at here? Are we expecting that `getConfiguration` does not exist in some versions of `Job`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32844987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala --- @@ -70,7 +71,7 @@ private[sql] case class InsertIntoHadoopFsRelation( relation.paths.length == 1, s"Cannot write to multiple destinations: ${relation.paths.mkString(",")}") -val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +val hadoopConf = new Configuration(sqlContext.sparkContext.hadoopConfiguration) --- End diff -- Do we need this? We already do `val job = new Job(hadoopConf)` below. BTW, we need to add comment to explain `new Job` will clone the conf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113475454 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113475413 [Test build #35259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35259/console) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113462120 @yhuai Updated PR description with an updated version of the summary commented above. This is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113460593 [Test build #35259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35259/consoleFull) for PR 6864 at commit [`14a47b9`](https://github.com/apache/spark/commit/14a47b90192be295f57f6f72a0f9e77a2e6b52b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113460011 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113459995 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449949 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449937 [Test build #35258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35258/console) for PR 6864 at commit [`6d946bd`](https://github.com/apache/spark/commit/6d946bd5d3d4bc5b701d5ffa83e9d0934603faef). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449649 @chenghao-intel Thanks for the comment! Speculation is a great point that I didn't notice. Updated this PR and now use a job level UUID instead of a task level one. Because essentially, what we want is to avoid name collision between different write jobs (potentially issued by different Spark applications). Within a single write job, we can always avoid name collision with the help of task ID. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449522 [Test build #35258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35258/consoleFull) for PR 6864 at commit [`6d946bd`](https://github.com/apache/spark/commit/6d946bd5d3d4bc5b701d5ffa83e9d0934603faef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449385 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113449360 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113042193 Thank you @liancheng for the summary, which is clear for me who didn't dive into this part before. One thing that I am think about when I review the code #6833, how to remove the redundant files when user switch on the `speculative` in writing data via data source interface? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113024897 Some background and a summary of offline discussion with @yhuai about this issue: In 1.4.0, we added `HadoopFsRelation` to abstract partition support of all data sources that are based on Hadoop `FileSystem` interface. Specifically, this makes partition discovery, partition pruning, and writing dynamic partitions for data sources much easier. From users' perspective, what the write path does is very similar to Hive. However, they differ a lot internally. When data are inserted into Hive tables via Spark SQL, `InsertIntoHiveTable` simulates Hive's behaviors: 1. Write data to a temporary location 2. Commit the write job 3. Move data in the temporary location to the final destination location using - `Hive.loadTable()` for non-partitioned table - `Hive.loadPartition()` for static partitions - `Hive.loadDynamicPartitions()` for dynamic partitions The important part is that, for appending data to existing tables in step 3, `Hive.copyFiles()` is invoked to move the data (I found the name is kinda confusing since no "copying" occurs here, we are just moving and renaming stuff). If a file in the source directory and another file in the destination directory happen to have the same name, say `part-r-1.parquet`, the former is moved to the destination directory and renamed with a `_copy_N` postfix (`part-r-1_copy_1.parquet`). That's how Hive avoids name collision. Some alternatives fixes considered: 1. Use similar approach as Hive This approach is not preferred in Spark 1.4.0 mainly because file metadata operations in S3 tend to be slow, especially for tables with lots of file and/or partitions. That's why `InsertIntoHadoopFsRelation` just inserts to destination directory directly, and is often used together with `DirectParquetOutputCommitter` to reduce latency when working with S3. This means, we don't have the chance to do renaming, and must avoid name collision from the beginning. 2. Same as 1.3, just move max part number detection back to driver side This isn't doable because unlike 1.3, 1.4 also takes dynamic partitioning into account. When inserting into dynamic partitions, we don't know which partition directories will be touched on driver side before issuing the write job. Checking all partition directories is simply too expensive for tables with thousands of partitions. 3. Add extra component to output file names to avoid name collision This seems to be the only reasonable solution for now. Currently, the ORC data source adds `System.currentTimeMillis` to the output file name. This is not 100% safe, but only fails when two tasks with the same task ID (which implies they belong to two separate concurrent jobs) are writing to the same location within a same millisecond, which is relatively unlikely to happen. The benefit of using a time stamp here is that, record order can be preserved. Another quite obvious choice is to add a UUID to the output file name. Obviously, the benefit is this practically avoids name collision. The drawback is that record order is not preserved any more. However, we never promise to preserve record order when writing data, and Hive doesn't promise this either (the `_copy_N` trick breaks record order). To sum up, adding a UUID to the output file name seems to be the simplest and safest way to fix this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113020276 [Test build #35077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35077/console) for PR 6864 at commit [`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113020322 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113010743 OK, found out that those integers are printed by `SQLQuerySuite.test script transform for stderr`. See [Josh's comment] [1]. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113009538 [Test build #35077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35077/consoleFull) for PR 6864 at commit [`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113009510 The last build failure looks pretty weird: a large part of Jenkins build log output are replaced by tens of thousands of lines of integer triples, and none of the 5 test failure can be reproduced locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113009233 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113009265 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-113009144 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/6864#discussion_r32690469 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -470,6 +470,33 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { checkAnswer(sqlContext.table("t"), df.select('b, 'c, 'a).collect()) } } + + // NOTE: This test suite is not super deterministic. On nodes with only relatively few cores + // (4 or even 1), it's hard to reproduce the data loss issue. But on nodes with for example 8 or + // more cores, the issue can be reproduced steadily. Fortunately our Jenkins builder meets this + // requirement. We probably want to move this test case to spark-integration-tests or spark-perf + // later. + test("SPARK-8406") { --- End diff -- Can you add a description in addition to the JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112981099 [Test build #35066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35066/console) for PR 6864 at commit [`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112981114 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112973920 [Test build #35066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35066/consoleFull) for PR 6864 at commit [`e5e92f3`](https://github.com/apache/spark/commit/e5e92f31b86d1975c7cc69abf2421810b3f788e1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112973514 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112973535 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8406] [SQL] Adding UUID to output file ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6864#issuecomment-112973573 Background and alternative solutions for this issue can be a little bit complex. Will give a summary of offline discussion with @yhuai here later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org