Reading assignTasks() in 0.20.2 reveals that the number of map tasks
assigned is not limited to 1 per heartbeat.

Cheers

On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Abhishek,
>
> This behavior is improved by MAPREDUCE-706 I believe (not certain that
> that's the JIRA, but I know it's fixed in trunk fairscheduler). These
> patches are included in CDH3 (currently in beta)
> http://archive.cloudera.com/cdh/3/
>
> In general, though, map tasks that are so short are not going to be very
> efficient - even with fast assignment there is some constant overhead per
> task.
>
> Thanks
> -Todd
>
> On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absha...@usc.edu>
> wrote:
>
> > Hi all,
> >
> > I have been using the Hadoop Fair Scheduler for some experiments on a
> > 100 node cluster with 2 map slots per node (hence, a total of 200 map
> > slots).
> >
> > In one of my experiments, all the map tasks finish within a heartbeat
> > interval of 3 seconds. I noticed that the maximum number of
> > concurrently
> > active map slots on my cluster never exceeds 100, and hence, the
> > cluster utilization during my experiments never exceeds 50% even when
> > large jobs with more than a 1000 maps are being executed.
> >
> > A look at the Fair Scheduler code (in particular, the assignTasks
> > function) revealed the reason.
> > As per my understanding, with the implementation in Hadoop 0.20.0, a
> > TaskTracker is not assigned more than 1 map and 1 reduce task per
> > heart beat.
> >
> > In my experiments, in every heart beat, each TT has 2 free map slots
> > but is assigned only 1 map task, and hence, the utilization never goes
> > beyond 50%.
> >
> > Of course, this (degenerate) case does not arise when map tasks take
> > more than one 1 heart beat interval to finish. For example, I repeated
> > the experiments with maps tasks taking close to 15 s to finish and
> > noticed close to 100 % utilization when large jobs were executing.
> >
> > Why does the Fair Scheduler not assign more than one map task to a TT
> > per heart beat? Is this done to spread the load uniformly across the
> > cluster?
> > I looked at assignTasks function in the default Hadoop scheduler
> > (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> > per heart beat to a TT.
> >
> > It will be easy to change the Fair Scheduler to assign more than 1 map
> > task to a TT per heart beat (I did that and achieved 100% utilization
> > even with small map tasks). But I am wondering, if doing so will
> > violate some fairness properties.
> >
> > Thanks,
> > Abhishek
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to