[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411616#comment-15411616
 ] 

Hong Shen commented on SPARK-16709:
-----------------------------------

Sorry for late, this is different from SPARK-14915, in the fact, we have 
already change the code that SPARK-14915 changed.
{code}
    sched.dagScheduler.taskEnded(tasks(index), reason, null, null, info, 
taskMetrics)
    // If speculative enable, when one task succeed, the other task with state 
RUNNING will be killed.
    // The killed task will statusUpdate with state KILLED/FAILED.
    // In this case, the task should not be re-add.
    if (!successful(index)) {
      addPendingTask(index)
    }
    if (!isZombie && state != TaskState.KILLED
{code}

Here is the reason, 
the log is
{code}
16/07/28 05:22:15 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 
(TID 175, 10.215.146.81, partition 1,PROCESS_LOCAL, 1930 bytes)

16/07/28 05:28:35 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.1 
(TID 207, 10.196.147.232, partition 1,PROCESS_LOCAL, 1930 bytes)

16/07/28 05:28:48 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 
(TID 175) in 393261 ms on 10.215.146.81 (3/50)

16/07/28 05:34:11 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.1 
(TID 207, 10.196.147.232): TaskCommitDenied (Driver denied task commit) for 
job: 1, partition: 1, attemptNumber: 207
{code}
1 task 1.0 in stage1.0 start
2 stage1.0 failed, start stage1.1.
3 task 1.0 in stage1.1 start
4 task 1.0 in stage1.0 finished.
5 task 1.0 in stage1.1 failed with TaskCommitDenied Exception, then retry 
forever.


> Task with commit failed will retry infinite when speculation set to true
> ------------------------------------------------------------------------
>
>                 Key: SPARK-16709
>                 URL: https://issues.apache.org/jira/browse/SPARK-16709
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Hong Shen
>         Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to