Chapter 8 of my book covers this in detail, the alpha chapter should be available at the apress web site Chain mapping rules! http://www.apress.com/book/view/1430219424
On Wed, Apr 8, 2009 at 3:30 PM, Nathan Marz <nat...@rapleaf.com> wrote: > You can also try decreasing the replication factor for the intermediate > files between jobs. This will make writing those files faster. > > > On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote: > > Hi, >> by far I am not an Hadoop expert but I think you can not start Map task >> until the previous Reduce is finished. Saying this it means that you >> probably have to store the Map output to the disk first (because a] it may >> not fit into memory and b] you would risk data loss if the system >> crashes). >> As for the job chaining you can check JobControl class ( >> >> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html >> )< >> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html >> > >> >> Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702 >> >> Regards, >> Lukas >> >> On Wed, Apr 8, 2009 at 11:30 PM, asif md <asif.d...@gmail.com> wrote: >> >> hi everyone, >>> >>> i have to chain multiple map reduce jobs < actually 2 to 4 jobs >, each >>> of >>> the jobs depends on the o/p of preceding job. In the reducer of each job >>> I'm >>> doing very little < just grouping by key from the maps>. I want to give >>> the >>> output of one MapReduce job to the next job without having to go to the >>> disk. Does anyone have any ideas on how to do this? >>> >>> Thanx. >>> >>> >> >> >> -- >> http://blog.lukas-vlcek.com/ >> > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422