Re: Reduce doesn't start until map finishes

Nick Cen Tue, 03 Mar 2009 20:56:32 -0800

Thanks, about the "Secondary Sort", can you provide some example. What does
the intermediate keys stands for?


Assume I have two mapper, m1 and m2. The output of m1 is (k1,v1),(k2,v2) and
the output of m2 is (k1,v3),(k2,v4). Assume k1 and k2 belongs to the same
partition and k1 < k2, so i think the order inside reducer maybe:
(k1,v1)
(k1,v3)
(k2,v2)
(k2,v4)

can the Secondary Sort change this order?



2009/3/4 Chris Douglas <chri...@yahoo-inc.com>

> The output of each map is sorted by partition and by key within that
> partition. The reduce merges sorted map output assigned to its partition
> into the reduce. The following may be helpful:
>
> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>
> If your job requires total order, consider
> o.a.h.mapred.lib.TotalOrderPartitioner. -C
>
>
> On Mar 3, 2009, at 7:24 PM, Nick Cen wrote:
>
>  can you provide more info about sortint? The sort is happend on the whole
>> data set, or just on the specified partion?
>>
>> 2009/3/4 Mikhail Yakshin <greycat.na....@gmail.com>
>>
>>  On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
>>>
>>>> This is normal behavior. The Reducer is guaranteed to receive all the
>>>> results for its partition in sorted order. No reduce can start until all
>>>>
>>> the
>>>
>>>> maps are completed, since any running map could emit a result that would
>>>> violate the order for the results it currently has. -C
>>>>
>>>
>>> _Reducers_ usually start almost immediately and start downloading data
>>> emitted by mappers as they go. This is their first phase. Their second
>>> phase can start only after completion of all mappers. In their second
>>> phase, they're sorting received data, and in their third phase they're
>>> doing real reduction.
>>>
>>> --
>>> WBR, Mikhail Yakshin
>>>
>>>
>>
>>
>> --
>> http://daily.appspot.com/food/
>>
>
>


-- 
http://daily.appspot.com/food/

Re: Reduce doesn't start until map finishes

Reply via email to