The output of each map is sorted by partition and by key within that partition. The reduce merges sorted map output assigned to its partition into the reduce. The following may be helpful:

http://hadoop.apache.org/core/docs/current/mapred_tutorial.html

If your job requires total order, consider o.a.h.mapred.lib.TotalOrderPartitioner. -C

On Mar 3, 2009, at 7:24 PM, Nick Cen wrote:

can you provide more info about sortint? The sort is happend on the whole
data set, or just on the specified partion?

2009/3/4 Mikhail Yakshin <greycat.na....@gmail.com>

On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
This is normal behavior. The Reducer is guaranteed to receive all the results for its partition in sorted order. No reduce can start until all
the
maps are completed, since any running map could emit a result that would
violate the order for the results it currently has. -C

_Reducers_ usually start almost immediately and start downloading data emitted by mappers as they go. This is their first phase. Their second
phase can start only after completion of all mappers. In their second
phase, they're sorting received data, and in their third phase they're
doing real reduction.

--
WBR, Mikhail Yakshin




--
http://daily.appspot.com/food/

Reply via email to