[
https://issues.apache.org/jira/browse/HADOOP-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665908#action_12665908
]
Hemanth Yamijala commented on HADOOP-5003:
------------------------------------------
I think it is reasonable to check that the reclaim time limit is something that
we can actually meet. Since the lost tasktracker detection interval is a time
period when the cluster state may be inaccurate, and we know there are cases
when we cannot meet it, I think it is reasonable to have users set a limit
that's larger than this value. Note that this time period is also configurable.
So if users feel that the default 10 minutes is too high, the cluster could be
potentially configured with a smaller value for that parameter as well (after
doing due diligence on the side effects of this).
Whether or not 10 minutes is high time depends on the profile of the jobs - for
small jobs this might be high, but for larger jobs (which take significantly
more time to run), a difference of a few minutes doesn't seem like it would
make a lot of difference.
> When computing absoluet guaranteed capacity (GC) from a percent value,
> Capacity Scheduler should round up floats, rather than truncate them.
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5003
> URL: https://issues.apache.org/jira/browse/HADOOP-5003
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Reporter: Vivek Ratan
> Priority: Minor
>
> The Capacity Scheduler calculates a queue's absolute GC value by getting its
> percent of the total cluster capacity (which is a float, since the configured
> GC% is a float) and casting it to an int. Casting a float to an int always
> rounds down. For very small clusters, this can result in the GC of a queue
> being one lower than what it should be. For example, if Q1 has a GC of 50%,
> Q2 has a GC of 40%, and Q3 has a GC of 10%, and if the cluster capacity is 4
> (as we have, in our test cases), Q1's GC works out to 2, Q2's to 1, and Q3's
> to 0 with today's code. Q2's capacity should really be 2, as 40% of 4,
> rounded up, should be 2.
> Simple fix is to use Math.round() rather than cast to an int.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.