[GitHub] [spark] tgravescs commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

GitBox Wed, 15 Jan 2020 10:45:09 -0800

tgravescs commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality 
wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-574796813
 
 
   so please update the description with information from the other PR.  The 
description should have basically a high level design of this approach with 
enough details before someone reads the code to make sure the code is doing 
what you are proposing.  What cases it covers and what cases you know it 
doesn't. 
   
   So the case you mentioned it doesn't cover:
   
   > The case I am referring to is: imagine you have 2 resources and an "all 
resource offer" is scheduled every second. when TSM1 is submitted, it'll also 
get an "all resource offer", and assume it rejects both, causing a prexisting 
TSM2 to utilize them. Assume those 2 tasks finish, and the freed resources are 
offered one by one to TSM1, which accepts both, all within 1 second (before any 
"all resource offer"). This should reset the timer, but it won't in the 
implementation.
   
   So the issue here is that we aren't really tracking when all resources are 
used we are proxying that. 
   To really calculate the free slots though is pretty complex when you take 
into account blacklisting (have both application and taskset level).
   I'm kind of thinking at this point the above case is ok, it favors not 
delaying and it will be fixed up on the next "all resource offer"
   
   One thing I don't think I like is that if you are fully scheduled, we keep 
trying to schedule "all resources" but if there are no resources, then we 
continue to reset the timer. This means that it takes a long time to fall back 
in case where you may have multiple tasksets and the first task set rejects it 
and the second one takes it and the tasks are finishing such that you get an 
all resources offer in between the task finishes. In this scenario the first 
taskset can get starved. We would need to perhaps track this separately.
   
   I need to take another walk through all the scenarios again as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

Reply via email to