Jason Moore created SPARK-14915:
-----------------------------------

             Summary: Tasks that fail due to CommitDeniedException (a 
side-effect of speculation) can cause job to never complete
                 Key: SPARK-14915
                 URL: https://issues.apache.org/jira/browse/SPARK-14915
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.2
            Reporter: Jason Moore
            Priority: Critical


In SPARK-14357, code was corrected towards the originally intended behavior 
that a CommitDeniedException should not count towards the failure count for a 
job.  After having run with this fix for a few weeks, it's become apparent that 
this behavior has some unintended consequences - that a speculative task will 
continuously receive a CDE from the driver, now causing it to fail and retry 
over and over without limit.

I'm thinking we could put a task that receives a CDE from the driver, into a 
TaskState.FINISHED or some other state to indicated that the task shouldn't be 
resubmitted by the TaskScheduler. I'd probably need some opinions on whether 
there are other consequences for doing something like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to