Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8687#discussion_r39344473
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
    @@ -984,6 +986,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
           hadoopConf.setOutputCommitter(classOf[FileOutputCommitter])
         }
     
    +    // When speculation is on and output committer class name contains 
"Direct", we should warn
    +    // users that they may loss data if they are using a direct output 
committer.
    +    val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
    +    if (speculationEnabled &&
    +      hadoopConf.get("mapred.output.committer.class", 
"").contains("Direct")) {
    +      logWarning("We may loss data when use direct output committer with 
speculation enabled, " +
    +        "please make sure your output committer doesn't write data 
directly.")
    +    }
    --- End diff --
    
    How about
    ```
    val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", 
"")
    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
      val warningMessage =
        s"$outputCommitterClass may be a output committer that writes data 
directly to the final location. " + 
        "Because speculation is enabled, this output committer may cause data 
loss (see the case in SPARK-10063). " +
        "If possible, please use a output committer that does not have this 
behavior (e.g. FileOutputCommitter)."
      logWarning(warningMessage)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to