[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140479545 I test it locally with these code: ``` sparkContext.conf.set("spark.speculation", "true") sparkContext.hadoopConfiguration.set("mapred.output.committer.class", "org.apache.spark.sql.hive.execution.DirectDummyOutputCommitter") sparkContext.makeRDD(Seq(1, 2)).saveAsTextFile("tmp") sparkContext.hadoopConfiguration.set("mapreduce.job.outputformat.class", "org.apache.spark.sql.hive.execution.DummyOutputFormatter") sparkContext.hadoopConfiguration.set("mapred.output.dir", "tmp") sparkContext.makeRDD(Seq(1 ->"a", 2 -> "b")).saveAsNewAPIHadoopDataset(sparkContext.hadoopConfiguration) ``` `DummyOutputFormatter` is a subclass of `FileOutputFormat` but override the `getOutputCommitter` method to return a customized `OutputCommitter` with "Direct" in its name. And the warning message do get logged. The hive write path should be similar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140122746 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140124239 [Test build #42429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/consoleFull) for PR 8687 at commit [`69b7d65`](https://github.com/apache/spark/commit/69b7d6588dd2c33b2c8a643ba0efde7499266160). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140122699 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140137297 LGTM. Will merge to master once it passes jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140170765 [Test build #42429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/console) for PR 8687 at commit [`69b7d65`](https://github.com/apache/spark/commit/69b7d6588dd2c33b2c8a643ba0efde7499266160). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140170934 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140170937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-140173122 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8687#discussion_r39344473 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -984,6 +986,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) hadoopConf.setOutputCommitter(classOf[FileOutputCommitter]) } +// When speculation is on and output committer class name contains "Direct", we should warn +// users that they may loss data if they are using a direct output committer. +val speculationEnabled = self.conf.getBoolean("spark.speculation", false) +if (speculationEnabled && + hadoopConf.get("mapred.output.committer.class", "").contains("Direct")) { + logWarning("We may loss data when use direct output committer with speculation enabled, " + +"please make sure your output committer doesn't write data directly.") +} --- End diff -- How about ``` val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") if (speculationEnabled && outputCommitterClass.contains("Direct")) { val warningMessage = s"$outputCommitterClass may be a output committer that writes data directly to the final location. " + "Because speculation is enabled, this output committer may cause data loss (see the case in SPARK-10063). " + "If possible, please use a output committer that does not have this behavior (e.g. FileOutputCommitter)." logWarning(warningMessage) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139737993 [Test build #42368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/consoleFull) for PR 8687 at commit [`db59c25`](https://github.com/apache/spark/commit/db59c25dc2c12d8fd2e44c118bfc2c47363f7d49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139737781 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139737782 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139745260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139745259 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8687#issuecomment-139745224 [Test build #42368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/console) for PR 8687 at commit [`db59c25`](https://github.com/apache/spark/commit/db59c25dc2c12d8fd2e44c118bfc2c47363f7d49). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org