tgravescs commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling. URL: https://github.com/apache/spark/pull/27207#issuecomment-574796813 so please update the description with information from the other PR. The description should have basically a high level design of this approach with enough details before someone reads the code to make sure the code is doing what you are proposing. What cases it covers and what cases you know it doesn't. So the case you mentioned it doesn't cover: > The case I am referring to is: imagine you have 2 resources and an "all resource offer" is scheduled every second. when TSM1 is submitted, it'll also get an "all resource offer", and assume it rejects both, causing a prexisting TSM2 to utilize them. Assume those 2 tasks finish, and the freed resources are offered one by one to TSM1, which accepts both, all within 1 second (before any "all resource offer"). This should reset the timer, but it won't in the implementation. So the issue here is that we aren't really tracking when all resources are used we are proxying that. To really calculate the free slots though is pretty complex when you take into account blacklisting (have both application and taskset level). I'm kind of thinking at this point the above case is ok, it favors not delaying and it will be fixed up on the next "all resource offer" One thing I don't think I like is that if you are fully scheduled, we keep trying to schedule "all resources" but if there are no resources, then we continue to reset the timer. This means that it takes a long time to fall back in case where you may have multiple tasksets and the first task set rejects it and the second one takes it and the tasks are finishing such that you get an all resources offer in between the task finishes. In this scenario the first taskset can get starved. We would need to perhaps track this separately. I need to take another walk through all the scenarios again as well
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org