Reading assignTasks() in 0.20.2 reveals that the number of map tasks assigned is not limited to 1 per heartbeat.
Cheers On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Abhishek, > > This behavior is improved by MAPREDUCE-706 I believe (not certain that > that's the JIRA, but I know it's fixed in trunk fairscheduler). These > patches are included in CDH3 (currently in beta) > http://archive.cloudera.com/cdh/3/ > > In general, though, map tasks that are so short are not going to be very > efficient - even with fast assignment there is some constant overhead per > task. > > Thanks > -Todd > > On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absha...@usc.edu> > wrote: > > > Hi all, > > > > I have been using the Hadoop Fair Scheduler for some experiments on a > > 100 node cluster with 2 map slots per node (hence, a total of 200 map > > slots). > > > > In one of my experiments, all the map tasks finish within a heartbeat > > interval of 3 seconds. I noticed that the maximum number of > > concurrently > > active map slots on my cluster never exceeds 100, and hence, the > > cluster utilization during my experiments never exceeds 50% even when > > large jobs with more than a 1000 maps are being executed. > > > > A look at the Fair Scheduler code (in particular, the assignTasks > > function) revealed the reason. > > As per my understanding, with the implementation in Hadoop 0.20.0, a > > TaskTracker is not assigned more than 1 map and 1 reduce task per > > heart beat. > > > > In my experiments, in every heart beat, each TT has 2 free map slots > > but is assigned only 1 map task, and hence, the utilization never goes > > beyond 50%. > > > > Of course, this (degenerate) case does not arise when map tasks take > > more than one 1 heart beat interval to finish. For example, I repeated > > the experiments with maps tasks taking close to 15 s to finish and > > noticed close to 100 % utilization when large jobs were executing. > > > > Why does the Fair Scheduler not assign more than one map task to a TT > > per heart beat? Is this done to spread the load uniformly across the > > cluster? > > I looked at assignTasks function in the default Hadoop scheduler > > (JobQueueTaskScheduler.java), and it does assign more than 1 map task > > per heart beat to a TT. > > > > It will be easy to change the Fair Scheduler to assign more than 1 map > > task to a TT per heart beat (I did that and achieved 100% utilization > > even with small map tasks). But I am wondering, if doing so will > > violate some fairness properties. > > > > Thanks, > > Abhishek > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >