Yes it is possible by using MultipleInputs format to multiple mapper
(basically 2 different mapper)

Setp: 1

MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class,
*Mapper1.class*);
 MultipleInputs.addInputPath(conf, new Path(args[1]),
TextInputFormat.class, *Mapper2.class*);

while defining two mappers value  put some identifier
(*output.collect(new Text(key), new Text(*identifier+"~" *+value));*)
related to a.txt and b.txt so that it will easy to distinct two file mapper
output within the reducer.


Step 2:
  put b.txt in the distcach and compare the reducer value against the
b.txt  List
            String currValue = values.next().toString();
            String valueSplitted[] = currValue.split("~");
           if(valueSplitted[0].equals("A")) // "A":- Identifier from A
mapper
            {
               //where process A file
            }
            else if(valueSplitted[0].equals("B")) //"B":- Identifier from
B mapper
            {
                       //here process B file
            }

           output.collect(new Text(key), new Text("Formated Value as like
you to display"));



Decide the key  as like what you want to produce the result.

After that you have to use one reducer to perform the ouput.

thanks
samir

On Tue, May 29, 2012 at 3:45 PM, liuzhg <liu...@cernet.com> wrote:

> Hi,
>
> I wonder that if Hadoop can solve effectively the question as following:
>
> ==========================================
> input file: a.txt, b.txt
> result: c.txt
>
> a.txt:
> id1,name1,age1,...
> id2,name2,age2,...
> id3,name3,age3,...
> id4,name4,age4,...
>
> b.txt:
> id1,address1,...
> id2,address2,...
> id3,address3,...
>
> c.txt
> id1,name1,age1,address1,...
> id2,name2,age2,address2,...
> ========================================
>
> I know that it can be done well by database.
> But I want to handle it with hadoop if possible.
> Can hadoop meet the requirement?
>
> Any suggestion can help me. Thank you very much!
>
> Best Regards,
>
> Gump
>
>
>

Reply via email to