Also, if one source of the join is small enough to fit in memory, you can build an in-memory table and do the map-side join on unsorted data.
On 5/21/08 11:43 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > > On May 21, 2008, at 11:16 AM, Shirley Cohen wrote: > >> How does one do a join operation in map reduce? Is there more than >> one way to do a join? Which way works better and why? > > There are a couple of ways, depending on what you need to do. If your > input data is sorted and partitioned equivalently on the same key, > you can do a join before the map (aka map-side join). The > documentation is at: http://tinyurl.com/5v4rot > > If your data is not sorted and partitioned consistently, you need to > do the join in the reduce. There is a library to help at: http:// > tinyurl.com/5cz669 > > -- Owen > >