Re: Pipelining data from map to reduce

Scott Carey Thu, 04 Mar 2010 15:48:30 -0800

Interesting article.  It claims to have the same fault tolerance but I don't 
see any explanation of how that can be.

If a single mapper fails part-way through a task when it has transmitted 
partial results to a reducer, the whole job is corrupted.  With the current 
barrier between map and reduce, a job can recover from partially completed 
tasks and speculatively execute.

I would imagine that small low latency tasks can benefit greatly from such an 
approach, but larger tasks need the barrier or will not be very fault tolerant. 
 However, there is still a lot of optimizations to dot in Hadoop for low 
latency tasks while maintaining the barrier.

On Mar 4, 2010, at 2:18 PM, Jeff Hammerbacher wrote:

> Also see "Breaking the MapReduce Stage Barrier" from UIUC:
> http://www.ideals.illinois.edu/bitstream/handle/2142/14819/breaking.pdf
> 
> On Thu, Mar 4, 2010 at 11:41 AM, Ashutosh Chauhan <
> ashutosh.chau...@gmail.com> wrote:
> 
>> Bharath,
>> 
>> This idea is  kicking around in academia.. not made into apache yet..
>> https://issues.apache.org/jira/browse/MAPREDUCE-1211
>> 
>> You can get a working prototype from:
>> http://code.google.com/p/hop/
>> 
>> Ashutosh
>> 
>> On Thu, Mar 4, 2010 at 09:06, E. Sammer <e...@lifeless.net> wrote:
>>> On 3/4/10 12:00 PM, bharath v wrote:
>>>> 
>>>> Hi ,
>>>> 
>>>> Can we pipeline the map output directly into reduce phase without
>>>> storing it in the local filesystem (avoiding disk IOs).
>>>> If yes , how to do that ?
>>> 
>>> Bharath:
>>> 
>>> No, there's no way to avoid going to disk after the mappers.
>>> 
>>> --
>>> Eric Sammer
>>> e...@lifeless.net
>>> http://esammer.blogspot.com
>>> 
>>

Re: Pipelining data from map to reduce

Reply via email to