Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 so I think the issue with the locality is that it resets the time (3s wait) whenever it schedules any task at the particular locality level (in this case node local) on any node. So it can take a lot longer then 3 seconds for it to fall back to rack local for any specific task. So if no tasks are node local on this node it can wait a long time to fall back. I think its more ideal we look at it on a per task basis. I don't see a reason to have a task wait 60 seconds+ skipping over rack local nodes. Locality doesn't matter that much for the majority of applications and you are just wasting time starting. I still need to look more at the scheduler logic to confirm and stuff, but either way I think this change is good to have. I'm going to be filing a separate jira for that shortly
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org