Yes it is possible by using MultipleInputs format to multiple mapper (basically 2 different mapper)
Setp: 1 MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class, *Mapper1.class*); MultipleInputs.addInputPath(conf, new Path(args[1]), TextInputFormat.class, *Mapper2.class*); while defining two mappers value put some identifier (*output.collect(new Text(key), new Text(*identifier+"~" *+value));*) related to a.txt and b.txt so that it will easy to distinct two file mapper output within the reducer. Step 2: put b.txt in the distcach and compare the reducer value against the b.txt List String currValue = values.next().toString(); String valueSplitted[] = currValue.split("~"); if(valueSplitted[0].equals("A")) // "A":- Identifier from A mapper { //where process A file } else if(valueSplitted[0].equals("B")) //"B":- Identifier from B mapper { //here process B file } output.collect(new Text(key), new Text("Formated Value as like you to display")); Decide the key as like what you want to produce the result. After that you have to use one reducer to perform the ouput. thanks samir On Tue, May 29, 2012 at 3:45 PM, liuzhg <liu...@cernet.com> wrote: > Hi, > > I wonder that if Hadoop can solve effectively the question as following: > > ========================================== > input file: a.txt, b.txt > result: c.txt > > a.txt: > id1,name1,age1,... > id2,name2,age2,... > id3,name3,age3,... > id4,name4,age4,... > > b.txt: > id1,address1,... > id2,address2,... > id3,address3,... > > c.txt > id1,name1,age1,address1,... > id2,name2,age2,address2,... > ======================================== > > I know that it can be done well by database. > But I want to handle it with hadoop if possible. > Can hadoop meet the requirement? > > Any suggestion can help me. Thank you very much! > > Best Regards, > > Gump > > >