Job scheduling (Re: Unable to run more than one job concurrently)

Andrzej Bialecki Fri, 19 May 2006 03:12:16 -0700

Andrzej Bialecki wrote:

Hi all,
I'm running Hadoop on a relatively small cluster (5 nodes) withgrowing datasets.
I noticed that if I start a job that is configured to run more maptasks than is the cluster capacity (mapred.tasktracker.tasks.maximum *number of nodes, 20 in this case), of course only that many map taskswill run, and when they are finished the next map tasks from that jobwill be scheduled.
However, when I try to start another job in parallel, only its reducetasks will be scheduled (uselessly spin-waiting for map output, andonly reducing the number of available tasks in the cluster...), and nomap tasks from this job will be scheduled - until the first jobcompletes. This feels wrong - not only I'm not making progress on thesecond job, but I'm also taking the slots away from the first job!
I'm somewhat miffed about this - I'd think that jobtracker shouldsplit the available resources evenly between these two jobs, i.e. itshould schedule some map tasks from the first job and some from thesecond one. This is not what is happening, though ...
Is this a configuration error, a bug, or a feature? :)

It seems it's a feature - I found the code inJobTracker.pollForNewTask(), and I'm not too happy about it.

Let's consider the following example: if I'm running a Nutch fetcher,the main limitation is the available bandwidth to fetch pages, and notthe capacity of the cluster. I'd love to be able to execute other jobsin parallel, so that I don't have to wait until fetcher completes. Icould sacrifice some of the task slots on tasktrackers for that otherjob, because the fetcher job wouldn't suffer from this anyway (at leastnot too much).

So, I'd like to change this code to pick up a random job from the listjobsByArrival, and take job.obtainNewMapTask from that randomly selectedjob. Would that work? Additionally, if no map tasks from that job havebeen allocated I'd like to skip adding reduce tasks from that job, laterin lines 721-750.

Perhaps we should extend JobInProgress to include a priority, andimplement something a la Unix scheduler.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Job scheduling (Re: Unable to run more than one job concurrently)

Reply via email to