Hi, Thanks Benajamin and Bibek for the detailed explanations and pointers. The question came after reading the paper Real-time MapReduce Scheduling ( http://repository.upenn.edu/cis_reports/942/) where in their experimental setup, they say they disabled the use of speculative execution and use of pipelining. Thus, I was wandering how to enforce the latter concept.
-bikash On Tue, Mar 1, 2011 at 9:48 AM, Benjamin Gufler <benjamin.guf...@tum.de>wrote: > On 2011-03-01 15:42, Bibek Paudel wrote: > >> On Tue, Mar 1, 2011 at 3:27 PM, Benjamin Gufler<benjamin.guf...@tum.de> >> wrote: >> >>> Is there a way to disable the use of pipelining , i.e., the reduce phase >>>> >>>> is >>>> started only after the map phase is completed? >>>> >>> you need to configure the mapred.reduce.slowstart.completed.maps property >>> in >>> mapred-site.xml. It gives the percentage of mappers which must be >>> complete >>> before the first reducers are launched. By setting it to 1, you should >>> obtain the wanted behaviour. >>> >> I think this only schedules the reducers, and the scheduled reducers >> start "copy" (followed by sort) stages. The actual "reduce" functions >> are called only after all the intermediate data from all mappers have >> been copied over. >> > > The "reduce" functions cannot be called earlier anyway, as the last mapper > to complete might produce output which must be processed on the first reduce > invocation. So, if it was not the early copying and sorting, I think I > didn't get your initial question, sorry. > > Benjamin >