Re: Partitioned Datasets Map/Reduce

abc xyz Tue, 06 Jul 2010 01:50:52 -0700


well, I want to do some experimentation with hadoop. I need to partition two 
datasets using same partitioning function and then in the next job, take the 
same partition from both datasets and apply some operation in the mapper. But 
how to ensure to get the same partition from both sources in one mapper??




________________________________
From: Hemanth Yamijala <[email protected]>
To: [email protected]
Sent: Tue, July 6, 2010 5:40:49 AM
Subject: Re: Partitioned Datasets Map/Reduce

Hi,

> I have written my custom partitioner for partitioning datasets. I want  to
> partition two datasets using the same partitioner and then in the  next
> mapreduce job, I want each mapper to handle the same partition from  the two
> sources and perform some function such as joining etc. How I  can I ensure 
that
> one mapper gets the split that corresponds to same  partition from both the
> sources?
>

Not really an answer to your specific question, but have you taken a
look at Pig (http://hadoop.apache.org/pig) which is suitable for
operations like Joining data sets ?

Re: Partitioned Datasets Map/Reduce

Reply via email to