[
https://issues.apache.org/jira/browse/HADOOP-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641274#action_12641274
]
Vivek Ratan commented on HADOOP-4428:
-------------------------------------
bq. We don't look at jobs in any other queue, if the job currently initialized
is not yet running.
I wanted to consider this in some detail. When initTasks() returns, the job is
still not in a running state, because its setup task needs to run first. For
discussion sake, assume we have queues Q1, Q2 ... Qn and that we're considering
queues in that order, starting with Q1. So after we call initTasks() on a job
in Q1 (say, J1), we have the following options, in order to find a task to run:
1. We can look at the next job in Q1 This is not a good option since we'll face
the same situation - we'll call initTasks() for the next job, and then look at
the next job, and so on.
1. We can look at jobs in the next queue. This is a viable option. It does seem
a bit unfair, because you're penalizing Q1 for the duration of the time it
takes for J1's setup task to run, but you could equally well argue that this
unfairness is temporary and is equally applicable to all queues.
1. We can return nothing to the the TT. As a result, all TTs that send
heartbeats to the JT during the time that J1's setup task is running, will get
nothing to run. Most setup tasks should take a couple of heartbeats to run, so
this won't be a frequent problem, but if the setup task contains user code that
does a bunch of stuff, the problem is exacerbated.
Upon further reflection, I'd argue for the second approach where we move on to
the next queue. Returning nothing to the TTs causes unnecessary
under-utilization.
The right way to do things, IMO, is get the setup/cleanup tasks out of
initTasks(), which I'll argue elsewhere, but this problem (of initTasks() not
necessarily changing the job's state to RUNNING) can rise up again if we decide
to call initTasks() in a separate thread, the way it's done in the default
scheduler.
> Job Priorities are not handled properly
> ----------------------------------------
>
> Key: HADOOP-4428
> URL: https://issues.apache.org/jira/browse/HADOOP-4428
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Environment: Cluster: 106 TTs MapCapacity=212, ReduceCapacity=212
> Single Queue=default, User Limit=25, Priorities = Yes.
> Using hadoop branch 0.19 revision=705159
> Reporter: Karam Singh
> Assignee: Vinod K V
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: HADOOP-4428-20081017.1.txt, HADOOP-4428-20081020.txt,
> HADOOP-4428.patch
>
>
> Job Priorities are not handled properly
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.