Zebing Lin created SPARK-30511:
----------------------------------

             Summary: Spark marks ended speculative tasks as pending leads to 
holding idle executors
                 Key: SPARK-30511
                 URL: https://issues.apache.org/jira/browse/SPARK-30511
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.3.0
            Reporter: Zebing Lin


*TL;DR*
When speculative tasks finished/failed/got killed, they are still considered as 
pending and count towards the calculation of number of needed executors.
h3. Symptom

In one of our production job (where it's running 4 tasks per executor), we 
found that it was holding 6 executors at the end with only 2 tasks running (1 
speculative). With more logging enabled, we found the job printed:

 
{code:java}
pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2

{code}
 

while the job only had 1 speculative task running and 16 speculative tasks 
intentionally killed because of corresponding original tasks had finished.
h3. The Bug

Upon examining the code of _pendingSpeculativeTasks_:

 
{code:java}
stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) =>
  numTasks - 
stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0)
}.sum
{code}
where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on 
_onSpeculativeTaskSubmitted_, but never decremented.  
_stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage 
completion. *This means Spark is marking ended speculative tasks as pending, 
which leads to Spark to hold more executors that it actually needs!*

 

I will have a PR ready to fix this issue, along with SPARK-2840 too

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to