Chapter 8 of my book covers this in detail, the alpha chapter should be
available at the apress web site
Chain mapping rules!

On Wed, Apr 8, 2009 at 3:30 PM, Nathan Marz <> wrote:

> You can also try decreasing the replication factor for the intermediate
> files between jobs. This will make writing those files faster.
> On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote:
>  Hi,
>> by far I am not an Hadoop expert but I think you can not start Map task
>> until the previous Reduce is finished. Saying this it means that you
>> probably have to store the Map output to the disk first (because a] it may
>> not fit into memory and b] you would risk data loss if the system
>> crashes).
>> As for the job chaining you can check JobControl class (
>> )<
>> >
>> Also you can look at
>> Regards,
>> Lukas
>> On Wed, Apr 8, 2009 at 11:30 PM, asif md <> wrote:
>>  hi everyone,
>>> i have to chain multiple map reduce jobs < actually 2 to 4 jobs >, each
>>> of
>>> the jobs depends on the o/p of preceding job. In the reducer of each job
>>> I'm
>>> doing very little < just grouping by key from the maps>. I want to give
>>> the
>>> output of one MapReduce job to the next job without having to go to the
>>> disk. Does anyone have any ideas on how to do this?
>>> Thanx.
>> --

Alpha Chapters of my book on Hadoop are available

Reply via email to