RE: how to unsubscribed this mail list

2012-09-18 Thread liuzhg
I am facing the same problem.

Who care?  

Gump


-Original Message-
From: sathyavageeswaran [mailto:sat...@morisonmenon.com] 
Sent: Tuesday, September 18, 2012 2:25 PM
To: common-user@hadoop.apache.org
Subject: RE: how to unsubscribed this mail list

Not allowed

-Original Message-
From: 黄 山 [mailto:thuhuang...@gmail.com] 
Sent: 18 September 2012 11:53
To: common-user@hadoop.apache.org
Subject: how to unsubscribed this mail list

I can't find any unsubscribed address for common-user

there are only common-dev, common-commit and common-issues
-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2221 / Virus Database: 2437/5274 - Release Date: 09/17/12



How to mapreduce in the scenario

2012-05-29 Thread liuzhg
Hi,
 
I wonder that if Hadoop can solve effectively the question as following:
 
==
input file: a.txt, b.txt
result: c.txt
 
a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...
 
b.txt: 
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...

 
I know that it can be done well by database. 
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?
 
Any suggestion can help me. Thank you very much!
 
Best Regards,
 
Gump




Re: How to mapreduce in the scenario

2012-05-29 Thread liuzhg
Hi,

Mike, Nitin, Devaraj, Soumya, samir, Robert 

Thank you all for your suggestions.

Actually, I want to know if hadoop has any advantage than routine database
in performance for solving this kind of problem ( join data ). 

 

Best Regards,

Gump

 

 

On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee
soumya.sbaner...@gmail.com wrote:

Hi,

You can also try to use the Hadoop Reduce Side Join functionality.
Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
Reduce classes to do the same.

Regards,
Soumya.


On Tue, May 29, 2012 at 4:10 PM, Devaraj k devara...@huawei.com wrote:

 Hi Gump,

   Mapreduce fits well for solving these types(joins) of problem.

 I hope this will help you to solve the described problem..

 1. Mapoutput key and value classes : Write a map out put key
 class(Text.class), value class(CombinedValue.class). Here value class
 should be able to hold the values from both the files(a.txt and b.txt) as
 shown below.

 class CombinedValue implements WritableComparator
 {
   String name;
   int age;
   String address;
   boolean isLeft; // flag to identify from which file
 }

 2. Mapper : Write a map() function which can parse from both the
 files(a.txt, b.txt) and produces common output key and value class.

 3. Partitioner : Write the partitioner in such a way that it will Send all
 the (key, value) pairs to same reducer which are having same key.

 4. Reducer : In the reduce() function, you will receive the records from
 both the files and you can combine those easily.


 Thanks
 Devaraj


 
 From: liuzhg [liu...@cernet.com]
 Sent: Tuesday, May 29, 2012 3:45 PM
 To: common-user@hadoop.apache.org
 Subject: How to mapreduce in the scenario

 Hi,

 I wonder that if Hadoop can solve effectively the question as following:

 ==
 input file: a.txt, b.txt
 result: c.txt

 a.txt:
 id1,name1,age1,...
 id2,name2,age2,...
 id3,name3,age3,...
 id4,name4,age4,...

 b.txt:
 id1,address1,...
 id2,address2,...
 id3,address3,...

 c.txt
 id1,name1,age1,address1,...
 id2,name2,age2,address2,...
 

 I know that it can be done well by database.
 But I want to handle it with hadoop if possible.
 Can hadoop meet the requirement?

 Any suggestion can help me. Thank you very much!

 Best Regards,

 Gump