[
https://issues.apache.org/jira/browse/HADOOP-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647593#action_12647593
]
Vivek Ratan commented on HADOOP-4658:
-------------------------------------
There are a couple of things going on here.
The scheduler is notified by the JT when a job is considered 'done' (see
HADOOP-4053). At that point, we recompute the number of users who have
submitted jobs in the queue. If this notification is coming in late, you need
to see why. But the scheduler's behavior is right in terms of computing the
user limit. The scheduler doesn't decide when a job is 'done'. The JT does. And
that, IMO,is correct behavior.
Secondly, if we walk through the entire queue and no job can accept a slot
(likely because, as in this case, either jobs are over capacity or don't have
tasks to run) and there are no waiting tasks, we still want the slot to be used
by the queue, so we walk through the queue again, this tiem without considering
user limits. Likely, the first job will start getting a bunch of slots. This is
by design, as it's really hard to argue what is fair in this case. Taking one
of your examples, suppose the queue capacity is 100 and we have four jobs from
four different users. Each is using 25 slots. J3 starts finishing up, and at
some point, is only running, say, 5 tasks. Also assume there are no waiting
jobs. Now, what's the right behavior? WHo should get the slot? J1, right? Who
gets the next slot? You can argue that you want to redistribute J3's 20 unused
slots among J1, J2, and J4, but this recomputation gets really complicated. So,
we took a simpler approach by saying that free slots are offered to jobs in
order. So J1 will get a bunch of free slots, which will let it finish fast.
Eventually J3 finishes, is taken out of the queue, and user limits are
recomputed. We couldn't think of a simple and more fair approach here. Note
that this situation is a bit rare. On a regular, well-utilized cluster, you'll
have a bunch of waiting jobs and they will start running. J3's user's first
waiting job will start running, which is the right thing.
So, in summary, I'd say this is expected behavior.
As for your comment on speculative tasks being run by J1 - that's really a
different call. J1 runs a speculative task if it legitimately has a speculate
task to run, not because there's a slot free. J1 can come back and say it
doesn't have any task to run, in which case J2 is looked at next. If J1 is
running 17 speculative tasks, they're as genuine, and higher priority, than
J2's tasks, so I'd say that's still the right behavior.
> User limit is not expanding back properly.
> ------------------------------------------
>
> Key: HADOOP-4658
> URL: https://issues.apache.org/jira/browse/HADOOP-4658
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Environment: GC=100% nodes=104, map_capacity=208,
> reduce_capacity=208, user-limit=25%;
> Reporter: Karam Singh
> Assignee: Amar Kamat
>
> User limit is not expanding back properly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.