The number 1 cause of this is something that causes a connection to get a
map output to fail. I have seen:
1) firewall
2) misconfigured ip addresses (ie: the task tracker attempting the fetch
received an incorrect ip address when it looked up the name of the
tasktracker with the map segment)
3) rare, the http server on the serving tasktracker is overloaded due to
insufficient threads or listen backlog, this can happen if the number of
fetches per reduce is large and the number of reduces or the number of maps
is very large

There are probably other cases, this recently happened to me when I had 6000
maps and 20 reducers on a 10 node cluster, which I believe was case 3 above.
Since I didn't actually need to reduce ( I got my summary data via counters
in the map phase) I never re-tuned the cluster.

On Wed, Aug 19, 2009 at 11:25 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> I think that the problem that I am remembering was due to poor recovery
> from
> this problem.  The underlying fault is likely due to poor connectivity
> between your machines.  Test that all members of your cluster can access
> all
> others on all ports used by hadoop.
>
> See here for hints: http://markmail.org/message/lgafou6d434n2dvx
>
> On Wed, Aug 19, 2009 at 10:39 PM, yang song <hadoop.ini...@gmail.com>
> wrote:
>
> >    Thank you Ted. Update current cluster is a huge work, we don't want to
> > do so. Could you tell me how 0.19.1 causes certain failures in detail?
> >    Thanks again.
> >
> > 2009/8/20 Ted Dunning <ted.dunn...@gmail.com>
> >
> > > I think I remember something about 19.1 in which certain failures would
> > > cause this.  Consider using an updated 19 or moving to 20 as well.
> > >
> > > On Wed, Aug 19, 2009 at 5:19 AM, yang song <hadoop.ini...@gmail.com>
> > > wrote:
> > >
> > > > I'm sorry, the version is 0.19.1
> > > >
> > > >
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to