[ https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Moore updated SPARK-14915: -------------------------------- Affects Version/s: 2.0.0 1.5.3 > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job to never complete > ----------------------------------------------------------------------------------------------------------- > > Key: SPARK-14915 > URL: https://issues.apache.org/jira/browse/SPARK-14915 > Project: Spark > Issue Type: Bug > Affects Versions: 1.5.3, 1.6.2, 2.0.0 > Reporter: Jason Moore > Assignee: Jason Moore > Priority: Critical > Fix For: 2.0.0 > > > In SPARK-14357, code was corrected towards the originally intended behavior > that a CommitDeniedException should not count towards the failure count for a > job. After having run with this fix for a few weeks, it's become apparent > that this behavior has some unintended consequences - that a speculative task > will continuously receive a CDE from the driver, now causing it to fail and > retry over and over without limit. > I'm thinking we could put a task that receives a CDE from the driver, into a > TaskState.FINISHED or some other state to indicated that the task shouldn't > be resubmitted by the TaskScheduler. I'd probably need some opinions on > whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org