Map-side join: Sort order preserved?

Stuart White Thu, 14 May 2009 08:05:23 -0700

I'm implementing a map-side join as described in chapter 8 of "Pro
Hadoop".  I have two files that have been partitioned using the
TotalOrderPartitioner on the same key into the same number of
partitions.  I've set mapred.min.split.size to Long.MAX_VALUE so that
one Mapper will handle an entire partition.


I want the output to be written in the same partitioned, total sort
order.  If possible, I want to accomplish this by setting my
NumReducers to 0 and having the output of my Mappers written directly
to HDFS, thereby skipping the partition/sort step.

My question is this: Am I guaranteed that the Mapper that processes
part-00000 will have its output written to the output file named
part-00000, the Mapper that processes part-00001 will have its output
written to part-00001, etc... ?

If so, then I can preserve the partitioning/sort order of my input
files without re-partitioning and re-sorting.

Thanks.

Map-side join: Sort order preserved?

Reply via email to