Hi Abhishek, This behavior is improved by MAPREDUCE-706 I believe (not certain that that's the JIRA, but I know it's fixed in trunk fairscheduler). These patches are included in CDH3 (currently in beta) http://archive.cloudera.com/cdh/3/
In general, though, map tasks that are so short are not going to be very efficient - even with fast assignment there is some constant overhead per task. Thanks -Todd On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absha...@usc.edu> wrote: > Hi all, > > I have been using the Hadoop Fair Scheduler for some experiments on a > 100 node cluster with 2 map slots per node (hence, a total of 200 map > slots). > > In one of my experiments, all the map tasks finish within a heartbeat > interval of 3 seconds. I noticed that the maximum number of > concurrently > active map slots on my cluster never exceeds 100, and hence, the > cluster utilization during my experiments never exceeds 50% even when > large jobs with more than a 1000 maps are being executed. > > A look at the Fair Scheduler code (in particular, the assignTasks > function) revealed the reason. > As per my understanding, with the implementation in Hadoop 0.20.0, a > TaskTracker is not assigned more than 1 map and 1 reduce task per > heart beat. > > In my experiments, in every heart beat, each TT has 2 free map slots > but is assigned only 1 map task, and hence, the utilization never goes > beyond 50%. > > Of course, this (degenerate) case does not arise when map tasks take > more than one 1 heart beat interval to finish. For example, I repeated > the experiments with maps tasks taking close to 15 s to finish and > noticed close to 100 % utilization when large jobs were executing. > > Why does the Fair Scheduler not assign more than one map task to a TT > per heart beat? Is this done to spread the load uniformly across the > cluster? > I looked at assignTasks function in the default Hadoop scheduler > (JobQueueTaskScheduler.java), and it does assign more than 1 map task > per heart beat to a TT. > > It will be easy to change the Fair Scheduler to assign more than 1 map > task to a TT per heart beat (I did that and achieved 100% utilization > even with small map tasks). But I am wondering, if doing so will > violate some fairness properties. > > Thanks, > Abhishek > -- Todd Lipcon Software Engineer, Cloudera