Hi Ted,

I was referring to version 0.20.0. As Todd pointed out, the issue I
pointed out was fixed in version 0.20.2.

I only looked at the Cloudera version 0.20.2+228
(http://archive.cloudera.com/cdh/3/) currently in beta.

I guess Hadoop 0.20.2 also has the fix. I will take a look at that too.

Thanks,
Abhishek

On Sun, Apr 11, 2010 at 2:51 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Reading assignTasks() in 0.20.2 reveals that the number of map tasks
> assigned is not limited to 1 per heartbeat.
>
> Cheers
>
> On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hi Abhishek,
>>
>> This behavior is improved by MAPREDUCE-706 I believe (not certain that
>> that's the JIRA, but I know it's fixed in trunk fairscheduler). These
>> patches are included in CDH3 (currently in beta)
>> http://archive.cloudera.com/cdh/3/
>>
>> In general, though, map tasks that are so short are not going to be very
>> efficient - even with fast assignment there is some constant overhead per
>> task.
>>
>> Thanks
>> -Todd
>>
>> On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absha...@usc.edu>
>> wrote:
>>
>> > Hi all,
>> >
>> > I have been using the Hadoop Fair Scheduler for some experiments on a
>> > 100 node cluster with 2 map slots per node (hence, a total of 200 map
>> > slots).
>> >
>> > In one of my experiments, all the map tasks finish within a heartbeat
>> > interval of 3 seconds. I noticed that the maximum number of
>> > concurrently
>> > active map slots on my cluster never exceeds 100, and hence, the
>> > cluster utilization during my experiments never exceeds 50% even when
>> > large jobs with more than a 1000 maps are being executed.
>> >
>> > A look at the Fair Scheduler code (in particular, the assignTasks
>> > function) revealed the reason.
>> > As per my understanding, with the implementation in Hadoop 0.20.0, a
>> > TaskTracker is not assigned more than 1 map and 1 reduce task per
>> > heart beat.
>> >
>> > In my experiments, in every heart beat, each TT has 2 free map slots
>> > but is assigned only 1 map task, and hence, the utilization never goes
>> > beyond 50%.
>> >
>> > Of course, this (degenerate) case does not arise when map tasks take
>> > more than one 1 heart beat interval to finish. For example, I repeated
>> > the experiments with maps tasks taking close to 15 s to finish and
>> > noticed close to 100 % utilization when large jobs were executing.
>> >
>> > Why does the Fair Scheduler not assign more than one map task to a TT
>> > per heart beat? Is this done to spread the load uniformly across the
>> > cluster?
>> > I looked at assignTasks function in the default Hadoop scheduler
>> > (JobQueueTaskScheduler.java), and it does assign more than 1 map task
>> > per heart beat to a TT.
>> >
>> > It will be easy to change the Fair Scheduler to assign more than 1 map
>> > task to a TT per heart beat (I did that and achieved 100% utilization
>> > even with small map tasks). But I am wondering, if doing so will
>> > violate some fairness properties.
>> >
>> > Thanks,
>> > Abhishek
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Reply via email to