Hi Mu,

Small job overhead is something that has been worked on a bit in recent
versions, but here's the gist of it (as best as I know, though I don't work
much in this area of the code):

- The JobTracker doesn't assign tasks forcefully to TaskTrackers. Instead,
the TaskTrackers send heartbeats at a certain interval
(MRConstants.HEARTBEAT_INTERVAL_MIN). The minimum interval is once every 3
seconds. For every 100 nodes above 300, that interval increases by one
second (MRConstants.CLUSTER_INCREMENT).

- Because of this, each task from the JobTracker can take up to 3 seconds to
get assigned to a TaskTracker.

- I believe that the TaskTrackers also do not report Task Completion Events
except as part of a Heartbeat. This means that after each task finishes,
there can be another 3 second delay before the JobTracker finds out about
it.

- Though these things seem inefficient, the reasoning is that, in a large
cluster of say 1000 nodes, the TTs could potentially overwhelm the
JobTracker if the heartbeats were more frequent. With more nodes, the amount
of time between a task being pending and a TT reporting a heartbeat is also
likely to be small. Additionally, MapReduce is designed in general for large
jobs where the amount of time spent in processing a task significantly
eclipses the scheduling time.

Given all of these delays, plus various amounts of time taken in copying
your job JAR to and from HDFS, even an "empty" job can take many seconds.
Around 20 sounds about right from my experience.

Hope that helps
-Todd


On Sun, Jul 12, 2009 at 9:52 PM, Mu Qiao <qiao...@gmail.com> wrote:

> Hi, everyone
>
> I've tested the hadoop environment I've set up. I noticed that it takes 24s
> to run a 2 mapper, 1 reducer job with empty input.
> Is it a reasonable time to run a do-nothing job? Why it takes so much time?
>
> Thanks
>
> --
> Best wishes,
> Qiao Mu
> MOE KLINNS Lab and SKLMS Lab, Xi'an Jiaotong University
> Department of Computer Science and Technology, Xi’an Jiaotong University
> TEL: 15991676983
> E-mail: qiao...@gmail.com
>

Reply via email to