Shengyue Ji created SPARK-11460:
-----------------------------------
Summary: Locality waits should be based on task set creation time,
not last launch time
Key: SPARK-11460
URL: https://issues.apache.org/jira/browse/SPARK-11460
Project: Spark
Issue Type: Bug
Components: Scheduler
Affects Versions: 1.5.1, 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0, 1.2.2, 1.2.1,
1.2.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1, 1.0.0
Environment: YARN
Reporter: Shengyue Ji
Spark waits for spark.locality.waits period before going from RACK_LOCAL to ANY
when selecting an executor for assignment. The timeout was essentially reset
each time a new assignment is made.
We were running Spark streaming on Kafka with a 10 second batch window on 32
Kafka partitions with 16 executors. All executors were in the ANY group. At one
point one RACK_LOCAL executor was added and all tasks were assigned to it. Each
task took about 0.6 second to process, resetting the spark.locality.wait
timeout (3000ms) repeatedly. This caused the whole process to under utilize
resources and created an increasing backlog.
spark.locality.wait should be based on the task set creation time, not last
launch time so that after 3000ms of initial creation, all executors can get
tasks assigned to them.
We are specifying a zero timeout for now as a workaround to disable locality
optimization.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]