[ 
https://issues.apache.org/jira/browse/HADOOP-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665701#action_12665701
 ] 

Owen O'Malley commented on HADOOP-5003:
---------------------------------------

It is still a bug. It just implies that it is unreasonable to have the reclaim 
timer be shorter than the TaskTracker time out. That makes sense. You can't 
enforce a SLA that is tighter than the accuracy of the information about the 
cluster.

Now back to your example. From Q2's point of view, he doesn't see that TT1 is 
down. He sees that he is allocated 50% and that he isn't getting the 5 slots he 
should. He gets mad that no timer is running to get him his slot back.

Take home messages:
  1. We should probably warn the user if a SLA time is less than the 
TaskTracker timeout.
  2. We need to continue to truncate when computing guaranteed capacity.
  3. We should remove the limitation that the timer can only start when another 
queue is over capacity.

> When computing absoluet guaranteed capacity (GC) from a percent value, 
> Capacity Scheduler should round up floats, rather than truncate them.
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5003
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5003
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Minor
>
> The Capacity Scheduler calculates a queue's absolute GC value by getting its 
> percent of the total cluster capacity (which is a float, since the configured 
> GC% is a float) and casting it to an int. Casting a float to an int always 
> rounds down. For very small clusters, this can result in the GC of a queue 
> being one lower than what it should be. For example, if Q1 has a GC of 50%, 
> Q2 has a GC of 40%, and Q3 has a GC of 10%, and if the cluster capacity is 4 
> (as we have, in our test cases), Q1's GC works out to 2, Q2's to 1, and Q3's 
> to 0 with today's code. Q2's capacity should really be 2, as 40% of 4, 
> rounded up, should be 2. 
> Simple fix is to use Math.round() rather than cast to an int. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to