[
https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634403#action_12634403
]
Hemanth Yamijala commented on HADOOP-4211:
------------------------------------------
The scenario in which this problem occurs is slightly different from what is
described. The actual scenario is as follows:
- Suppose we have n slots in the system, and 25% is the minimum user limit.
- When we submit 3 jobs as 3 different users one after the other, in steady
state, each user gets n/3 slots.
- Let these 3 jobs complete.
- Now, submit 2 more jobs as 2 different users.
- The expectation is that the users get n/2 slots in steady state. However, the
first user gets 2n/3 slots and the other user gets n/3 slots.
The reason for this behavior is directly related to HADOOP-4053. Currently,
there is no notification to the schedulers that a job has completed.
In the {{CapacityTaskScheduler}}, the limit is computed as follows:
{code}
limit = Math.max((int)(Math.ceil((double)currentCapacity/
(double)qsi.numJobsByUser.size())),
(int)(Math.ceil((double)(qsi.ulMin*currentCapacity)/100.0)));
{code}
A user is added to the map {{numJobsByUser}} when a job is added. The intent
was that the user is removed from this map upon job completion. However, since
this event is not yet raised, the number of users is not correctly updated. As
a result, the limit is still computed as n/3, instead of n/2. And currently, if
all users have hit the limit, then the first user with running jobs is given
any remaining slots, explaining the behavior observed.
In summary, if HADOOP-4053 is fixed, this issue will automatically get fixed.
In fact, I applied the patch currently available on HADOOP-4053 and verified
the behavior is correct now. That is, the limit is recomputed correctly.
I discussed this with Karam, and we agree that the observations are correct.
I'll mark HADOOP-4053 a blocker for this bug. When that gets committed, Karam
can try out again and close this bug.
> Capacity Scheduler does not divide queue resources properly among users, when
> jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4211
> URL: https://issues.apache.org/jira/browse/HADOOP-4211
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces.
> User limit =25% and only one queue.
> Reporter: Karam Singh
> Assignee: Hemanth Yamijala
> Priority: Blocker
> Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources properly among users,
> when job are submitted one after other. E.g. user limit =25. Say User1's job
> is running. Then user2 submits a job. Then user1's job uses 75% and user2's
> job 25%=user limit.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.