The ChainMapper class introduced in Hadoop 19 will provide you with the ability to have an arbitrary number of map tasks to run one after the other, in the context of a single job. The one issue to be aware of is that the chain of mappers only see the output the previous map in the chain.
There is a nice discussion of this in chapter 8 of Pro Hadoop, by Apress.com On Sun, Jun 28, 2009 at 5:04 AM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > See this .. hope this answers your question . > > http://developer.yahoo.com/hadoop/tutorial/module4.html#tips > > On Sun, Jun 28, 2009 at 5:28 PM, bonito <bonito.pe...@gmail.com> wrote: > > > > > Hello! > > I am a new hadoop user and my question may sound naive.. > > However, I would like to ask if there is a way to combine the results of > > two > > mpa tasks that may "run" simultaneously. > > I use the MultipleInput class and thus I have two different mappers. > > I want the result/output of the one map (associated with one input file) > to > > be used in the process of the second map (associated with the second > input > > file). > > I have thought of storing the map1 output in the hdfs and retrieving it > > using the map2. > > However, I have no clue whether this is possible. I mean...what about > > time-executing issues? map2 has to wait until map1 is completed... > > > > The thought of executing them in a serial manner is not the one I really > > want... > > > > Any suggestion would be appreciated. > > Thank you in advance :) > > > > -- > > View this message in context: > > http://www.nabble.com/combine-two-map-tasks-tp24240928p24240928.html > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals