Also, if one source of the join is small enough to fit in memory, you can
build an in-memory table and do the map-side join on unsorted data.


On 5/21/08 11:43 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:

> 
> On May 21, 2008, at 11:16 AM, Shirley Cohen wrote:
> 
>> How does one do a join operation in map reduce? Is there more than
>> one way to do a join? Which way works better and why?
> 
> There are a couple of ways, depending on what you need to do. If your
> input data is sorted and partitioned equivalently on the same key,
> you can do a join before the map (aka map-side join). The
> documentation is at:  http://tinyurl.com/5v4rot
> 
> If your data is not sorted and partitioned consistently, you need to
> do the join in the reduce. There is a library to help at: http://
> tinyurl.com/5cz669
> 
> -- Owen
> 
> 

Reply via email to