Hi,
Mike, Nitin, Devaraj, Soumya, samir, Robert
Thank you all for your suggestions.
Actually, I want to know if hadoop has any advantage than routine database
in performance for solving this kind of problem ( join data ).
Best Regards,
Gump
On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee
soumya.sbaner...@gmail.com wrote:
Hi,
You can also try to use the Hadoop Reduce Side Join functionality.
Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
Reduce classes to do the same.
Regards,
Soumya.
On Tue, May 29, 2012 at 4:10 PM, Devaraj k devara...@huawei.com wrote:
Hi Gump,
Mapreduce fits well for solving these types(joins) of problem.
I hope this will help you to solve the described problem..
1. Mapoutput key and value classes : Write a map out put key
class(Text.class), value class(CombinedValue.class). Here value class
should be able to hold the values from both the files(a.txt and b.txt) as
shown below.
class CombinedValue implements WritableComparator
{
String name;
int age;
String address;
boolean isLeft; // flag to identify from which file
}
2. Mapper : Write a map() function which can parse from both the
files(a.txt, b.txt) and produces common output key and value class.
3. Partitioner : Write the partitioner in such a way that it will Send all
the (key, value) pairs to same reducer which are having same key.
4. Reducer : In the reduce() function, you will receive the records from
both the files and you can combine those easily.
Thanks
Devaraj
From: liuzhg [liu...@cernet.com]
Sent: Tuesday, May 29, 2012 3:45 PM
To: common-user@hadoop.apache.org
Subject: How to mapreduce in the scenario
Hi,
I wonder that if Hadoop can solve effectively the question as following:
==
input file: a.txt, b.txt
result: c.txt
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
b.txt:
id1,address1,...
id2,address2,...
id3,address3,...
c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?
Any suggestion can help me. Thank you very much!
Best Regards,
Gump