Hi Cliff,

Thanks it did turn out to be speculative execution. When I turned it off, no
more tasks were killed and the performance degraded.

So my initial assumptions were incorrect after all. I guess I'll have to
look at other ways to improve performance.

Thanks for the help.
-aniket

On Thu, Sep 23, 2010 at 5:14 PM, cliff palmer <palmercl...@gmail.com> wrote:

> Aniket, I wonder if these tasks were run as Speculative Execution.  Have
> you
> been able to determine whether the job runs successfully?
> HTH
> Cliff
>
> On Thu, Sep 23, 2010 at 12:52 AM, aniket ray <aniket....@gmail.com> wrote:
>
> > Hi,
> >
> > I continuously run a series of batch job using Hadoop Map Reduce. I also
> > have a managing daemon that moves data around on the hdfs making way for
> > more jobs to be run.
> > I use capacity scheduler to schedule many jobs in parallel.
> >
> > I see an issue on the Hadoop web monitoring UI at port 50030 which I
> > believe
> > may be causing a performance bottleneck and wanted to get more
> information.
> >
> > Approximately 10% of the reduce tasks show up as "Killed" in the UI. The
> > logs say that the killed tasks are in the shuffle phase when they are
> > killed
> > but the logs don't show any exception.
> > My understanding is that these killed tasks would be started again and
> this
> > slows down the whole hadoop job.
> > I was wondering what the possible issues maybe and how to debug this
> issue?
> >
> > I have tried on both the hadoop 0.20.2 and the latest version of hadoop
> > from
> > yahoo's github.
> > I've monitored the nodes and there is a lot of free disk space and memory
> > on
> > all nodes (more than 1 TB free disk and 5 GB free memory at all times on
> > all
> > nodes).
> >
> > Since there are no exceptions and any other visible issues, I am finding
> it
> > hard to figure out what the problem might be. Could anybody help?
> >
> > Thanks,
> > -aniket
> >
>

Reply via email to