[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034625#comment-15034625
 ] 

Thomas Graves commented on SPARK-11701:
---------------------------------------

So it looks like this is a race condition. If the task end for tasks (in this 
case probably because of speculation) comes in after the stage is finished, 
then the DAGScheduler.handleTaskCompletion will skip the task completion event:

   if (!stageIdToStage.contains(task.stageId)) {
      // Skip all the actions if the stage has been cancelled.
      return
    }

Since it skips here it never sends out the SparkListenerTaskEnd event and the 
UI is never updated. I'm assuming this also is affecting the dynamic allocation 
stuff too (at least in 1.5). I still have to make sure that still exists in 1.6.

> YARN - dynamic allocation and speculation active task accounting wrong
> ----------------------------------------------------------------------
>
>                 Key: SPARK-11701
>                 URL: https://issues.apache.org/jira/browse/SPARK-11701
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to