[GitHub] spark issue #16959: [SPARK-19631][CORE] OutputCommitCoordinator should not a...

squito Tue, 28 Feb 2017 12:38:08 -0800

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/16959
  
    @kayousterhout 
    > This commit makes me worried there are more bugs related to #16620. For 
example, what if a task was OK'ed to commit, but then DAGScheduler decides to 
ignore it because of the epoch. The DAGScheduler / TaskSetManager will attempt 
to re-run the task, but the output commit will never be OK'ed, which will cause 
the task to fail a bunch of times and the stage to get aborted. Maybe this is 
OK because it's unlikely a stage will both be a shuffle map stage and also save 
output to HDFS? Thoughts?
    
    yes, I think you are right, both about the bug, and that its pretty 
unlikely.  It looks like `SparkHadoopMapRedUtil` is a public class, so a user 
could write to hdfs inside a map stage, but that would be pretty weird.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16959: [SPARK-19631][CORE] OutputCommitCoordinator should not a...

Reply via email to