[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064200#comment-15064200
 ] 

Thomas Graves commented on SPARK-11701:
---------------------------------------

I ran into another instance of this and its when the job has multiple stages, 
if its not the last stage and both speculative tasks finish, they are both 
marked as success.  One of them gets ignored which can leave counts wrong and 
it shows that an executor still has a task.

15/12/18 16:01:08 INFO scheduler.TaskSetManager: Ignoring task-finished event 
for 8.1 in stage 0.0 because task 8 has already completed successfully

In this case the TaskCommit code and DAG scheduler won't handle it, the 
TaskSetManager.handleSuccessfulTask needs to handle it.

> YARN - dynamic allocation and speculation active task accounting wrong
> ----------------------------------------------------------------------
>
>                 Key: SPARK-11701
>                 URL: https://issues.apache.org/jira/browse/SPARK-11701
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>            Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to