Hi Pankil, Basically there are two steps here - the first is to sort the two files. This can be done using an mapreduce where the mapper extracts the join column as a key.
If you make sure you have the same number of reducers (and partition by the equijoin column) for both sorts, then you'll end up with: A B part-0 part-0 part-1 part-1 etc Each corresponding part file will be in sorted order, and you can perform the merge. To do the merge, you can just pick either A or B as your input for locality hints, and then, in the mapper, given the file name, determine the filename of the other partition. Open that up as a side input in your mapper and perform the merge like you would in a non-distributed setting. Hope this helps -Todd On Thu, Jul 9, 2009 at 9:09 AM, Pankil Doshi <forpan...@gmail.com> wrote: > Hi, > > Does anyone has hint on how to implement "SORT-MERGE JOIN" using map-reduce > paradigm? > I read article regarding it on Pig wiki but did not got clarity as it > doesn't show in form of map and reduce. > > Pankil >