Shengyue Ji created SPARK-11460: ----------------------------------- Summary: Locality waits should be based on task set creation time, not last launch time Key: SPARK-11460 URL: https://issues.apache.org/jira/browse/SPARK-11460 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.5.1, 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1, 1.0.0 Environment: YARN Reporter: Shengyue Ji
Spark waits for spark.locality.waits period before going from RACK_LOCAL to ANY when selecting an executor for assignment. The timeout was essentially reset each time a new assignment is made. We were running Spark streaming on Kafka with a 10 second batch window on 32 Kafka partitions with 16 executors. All executors were in the ANY group. At one point one RACK_LOCAL executor was added and all tasks were assigned to it. Each task took about 0.6 second to process, resetting the spark.locality.wait timeout (3000ms) repeatedly. This caused the whole process to under utilize resources and created an increasing backlog. spark.locality.wait should be based on the task set creation time, not last launch time so that after 3000ms of initial creation, all executors can get tasks assigned to them. We are specifying a zero timeout for now as a workaround to disable locality optimization. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org