Re: Chaining Multiple Map reduce jobs.
Hi, by far I am not an Hadoop expert but I think you can not start Map task until the previous Reduce is finished. Saying this it means that you probably have to store the Map output to the disk first (because a] it may not fit into memory and b] you would risk data loss if the system crashes). As for the job chaining you can check JobControl class ( http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702 Regards, Lukas On Wed, Apr 8, 2009 at 11:30 PM, asif md asif.d...@gmail.com wrote: hi everyone, i have to chain multiple map reduce jobs actually 2 to 4 jobs , each of the jobs depends on the o/p of preceding job. In the reducer of each job I'm doing very little just grouping by key from the maps. I want to give the output of one MapReduce job to the next job without having to go to the disk. Does anyone have any ideas on how to do this? Thanx. -- http://blog.lukas-vlcek.com/
Re: Chaining Multiple Map reduce jobs.
You can also try decreasing the replication factor for the intermediate files between jobs. This will make writing those files faster. On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote: Hi, by far I am not an Hadoop expert but I think you can not start Map task until the previous Reduce is finished. Saying this it means that you probably have to store the Map output to the disk first (because a] it may not fit into memory and b] you would risk data loss if the system crashes). As for the job chaining you can check JobControl class ( http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html) http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702 Regards, Lukas On Wed, Apr 8, 2009 at 11:30 PM, asif md asif.d...@gmail.com wrote: hi everyone, i have to chain multiple map reduce jobs actually 2 to 4 jobs , each of the jobs depends on the o/p of preceding job. In the reducer of each job I'm doing very little just grouping by key from the maps. I want to give the output of one MapReduce job to the next job without having to go to the disk. Does anyone have any ideas on how to do this? Thanx. -- http://blog.lukas-vlcek.com/
Re: Chaining Multiple Map reduce jobs.
Chapter 8 of my book covers this in detail, the alpha chapter should be available at the apress web site Chain mapping rules! http://www.apress.com/book/view/1430219424 On Wed, Apr 8, 2009 at 3:30 PM, Nathan Marz nat...@rapleaf.com wrote: You can also try decreasing the replication factor for the intermediate files between jobs. This will make writing those files faster. On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote: Hi, by far I am not an Hadoop expert but I think you can not start Map task until the previous Reduce is finished. Saying this it means that you probably have to store the Map output to the disk first (because a] it may not fit into memory and b] you would risk data loss if the system crashes). As for the job chaining you can check JobControl class ( http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html ) http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702 Regards, Lukas On Wed, Apr 8, 2009 at 11:30 PM, asif md asif.d...@gmail.com wrote: hi everyone, i have to chain multiple map reduce jobs actually 2 to 4 jobs , each of the jobs depends on the o/p of preceding job. In the reducer of each job I'm doing very little just grouping by key from the maps. I want to give the output of one MapReduce job to the next job without having to go to the disk. Does anyone have any ideas on how to do this? Thanx. -- http://blog.lukas-vlcek.com/ -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422