Shengyue Ji created SPARK-11460:
-----------------------------------

             Summary: Locality waits should be based on task set creation time, 
not last launch time
                 Key: SPARK-11460
                 URL: https://issues.apache.org/jira/browse/SPARK-11460
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.5.1, 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0, 1.2.2, 1.2.1, 
1.2.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1, 1.0.0
         Environment: YARN
            Reporter: Shengyue Ji


Spark waits for spark.locality.waits period before going from RACK_LOCAL to ANY 
when selecting an executor for assignment. The timeout was essentially reset 
each time a new assignment is made.

We were running Spark streaming on Kafka with a 10 second batch window on 32 
Kafka partitions with 16 executors. All executors were in the ANY group. At one 
point one RACK_LOCAL executor was added and all tasks were assigned to it. Each 
task took about 0.6 second to process, resetting the spark.locality.wait 
timeout (3000ms) repeatedly. This caused the whole process to under utilize 
resources and created an increasing backlog.

spark.locality.wait should be based on the task set creation time, not last 
launch time so that after 3000ms of initial creation, all executors can get 
tasks assigned to them.

We are specifying a zero timeout for now as a workaround to disable locality 
optimization. 

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to