I'm implementing a map-side join as described in chapter 8 of "Pro Hadoop". I have two files that have been partitioned using the TotalOrderPartitioner on the same key into the same number of partitions. I've set mapred.min.split.size to Long.MAX_VALUE so that one Mapper will handle an entire partition.
I want the output to be written in the same partitioned, total sort order. If possible, I want to accomplish this by setting my NumReducers to 0 and having the output of my Mappers written directly to HDFS, thereby skipping the partition/sort step. My question is this: Am I guaranteed that the Mapper that processes part-00000 will have its output written to the output file named part-00000, the Mapper that processes part-00001 will have its output written to part-00001, etc... ? If so, then I can preserve the partitioning/sort order of my input files without re-partitioning and re-sorting. Thanks.