[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-15 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140479545
  
I test it locally with these code:
```
sparkContext.conf.set("spark.speculation", "true")

sparkContext.hadoopConfiguration.set("mapred.output.committer.class",
  "org.apache.spark.sql.hive.execution.DirectDummyOutputCommitter")
sparkContext.makeRDD(Seq(1, 2)).saveAsTextFile("tmp")


sparkContext.hadoopConfiguration.set("mapreduce.job.outputformat.class",
  "org.apache.spark.sql.hive.execution.DummyOutputFormatter")
sparkContext.hadoopConfiguration.set("mapred.output.dir", "tmp")
sparkContext.makeRDD(Seq(1 ->"a", 2 -> 
"b")).saveAsNewAPIHadoopDataset(sparkContext.hadoopConfiguration)
```

`DummyOutputFormatter` is a subclass of `FileOutputFormat` but override the 
`getOutputCommitter` method to return a customized `OutputCommitter` with 
"Direct" in its name.

And the warning message do get logged. The hive write path should be 
similar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140122746
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140124239
  
  [Test build #42429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/consoleFull)
 for   PR 8687 at commit 
[`69b7d65`](https://github.com/apache/spark/commit/69b7d6588dd2c33b2c8a643ba0efde7499266160).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140122699
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140137297
  
LGTM. Will merge to master once it passes jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140170765
  
  [Test build #42429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/console)
 for   PR 8687 at commit 
[`69b7d65`](https://github.com/apache/spark/commit/69b7d6588dd2c33b2c8a643ba0efde7499266160).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140170934
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140170937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42429/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-140173122
  
Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8687


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8687#discussion_r39344473
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -984,6 +986,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   hadoopConf.setOutputCommitter(classOf[FileOutputCommitter])
 }
 
+// When speculation is on and output committer class name contains 
"Direct", we should warn
+// users that they may loss data if they are using a direct output 
committer.
+val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
+if (speculationEnabled &&
+  hadoopConf.get("mapred.output.committer.class", 
"").contains("Direct")) {
+  logWarning("We may loss data when use direct output committer with 
speculation enabled, " +
+"please make sure your output committer doesn't write data 
directly.")
+}
--- End diff --

How about
```
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", 
"")
if (speculationEnabled && outputCommitterClass.contains("Direct")) {
  val warningMessage =
s"$outputCommitterClass may be a output committer that writes data 
directly to the final location. " + 
"Because speculation is enabled, this output committer may cause data 
loss (see the case in SPARK-10063). " +
"If possible, please use a output committer that does not have this 
behavior (e.g. FileOutputCommitter)."
  logWarning(warningMessage)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139737993
  
  [Test build #42368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/consoleFull)
 for   PR 8687 at commit 
[`db59c25`](https://github.com/apache/spark/commit/db59c25dc2c12d8fd2e44c118bfc2c47363f7d49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139737781
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139737782
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139745260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139745259
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9899][SQL] log warning for direct outpu...

2015-09-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8687#issuecomment-139745224
  
  [Test build #42368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42368/console)
 for   PR 8687 at commit 
[`db59c25`](https://github.com/apache/spark/commit/db59c25dc2c12d8fd2e44c118bfc2c47363f7d49).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org