Check the `mapreduce.job.reduce.slowstart.completedmaps` parameter. The
reducers cannot start processing the data from the mappers until the all
the map tasks are complete, but the reducers can start fetching the data
from the nodes on which the map tasks have completed.

Praveen

On Thu, Dec 29, 2011 at 12:44 AM, Prashant Kommireddi
<prash1...@gmail.com>wrote:

> By design reduce would start only after all the maps finish. There is
> no way for the reduce to begin grouping/merging by key unless all the
> maps have finished.
>
> Sent from my iPhone
>
> On Dec 28, 2011, at 8:53 AM, JAGANADH G <jagana...@gmail.com> wrote:
>
> > Hi All,
> >
> > I wrote a map reduce program to fetch data from MySQL and process the
> > data(word count).
> > The program executes successfully . But I noticed that the reduce task
> > starts after finishing the map task only .
> > Is there any way to run the map and reduce in parallel.
> >
> > The program fetches data from MySQL and writes the processed output to
> > hdfs.
> > I am using hadoop in pseduo-distributed mode .
> > --
> > **********************************
> > JAGANADH G
> > http://jaganadhg.in
> > *ILUGCBE*
> > http://ilugcbe.org.in
>

Reply via email to