well, I want to do some experimentation with hadoop. I need to partition two datasets using same partitioning function and then in the next job, take the same partition from both datasets and apply some operation in the mapper. But how to ensure to get the same partition from both sources in one mapper??
________________________________ From: Hemanth Yamijala <[email protected]> To: [email protected] Sent: Tue, July 6, 2010 5:40:49 AM Subject: Re: Partitioned Datasets Map/Reduce Hi, > I have written my custom partitioner for partitioning datasets. I want to > partition two datasets using the same partitioner and then in the next > mapreduce job, I want each mapper to handle the same partition from the two > sources and perform some function such as joining etc. How I can I ensure that > one mapper gets the split that corresponds to same partition from both the > sources? > Not really an answer to your specific question, but have you taken a look at Pig (http://hadoop.apache.org/pig) which is suitable for operations like Joining data sets ?
