The output of each map is sorted by partition and by key within that
partition. The reduce merges sorted map output assigned to its
partition into the reduce. The following may be helpful:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
If your job requires total order, consider
o.a.h.mapred.lib.TotalOrderPartitioner. -C
On Mar 3, 2009, at 7:24 PM, Nick Cen wrote:
can you provide more info about sortint? The sort is happend on the
whole
data set, or just on the specified partion?
2009/3/4 Mikhail Yakshin <greycat.na....@gmail.com>
On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
This is normal behavior. The Reducer is guaranteed to receive all
the
results for its partition in sorted order. No reduce can start
until all
the
maps are completed, since any running map could emit a result that
would
violate the order for the results it currently has. -C
_Reducers_ usually start almost immediately and start downloading
data
emitted by mappers as they go. This is their first phase. Their
second
phase can start only after completion of all mappers. In their second
phase, they're sorting received data, and in their third phase
they're
doing real reduction.
--
WBR, Mikhail Yakshin
--
http://daily.appspot.com/food/