On Fri, Mar 28, 2008 at 12:31 AM, 朱盛凯 <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I met this problem in my cluster before, I think I can share with you some
> of my experience.
> But it may not work in you case.
>
> The job in my cluster always hung at 16% of reduce. It occured because the
> reduce task could not fetch the
> map output from other nodes.
>
> In my case, two factors may result in this faliure of communication
> between
> two task trackers.
>
> One is the firewall block the trackers from communications. I solved this
> by
> disabling the firewall.
> The other factor is that trackers refer to other nodes by host name only,
> but not ip address. I solved this by editing the file /etc/hosts
> with mapping from hostname to ip address of all nodes in cluster.


I meet this problem with the same reason too.
Try to host names to all your /etc/hosts files .

>
>
> I hope my experience will be helpful for you.
>
> On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> > I have small Hadoop cluster, one master and three slaves.
> > When I try the example wordcount on one of our log file (size ~350 MB)
> >
> > Map runs fine but reduce always hangs (sometime around 19%,60% ...)
> after
> > very long time it finishes.
> > I am seeing this error
> > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> > In the log I am seeing this
> > INFO org.apache.hadoop.mapred.TaskTracker:
> > task_200803261535_0001_r_000000_0 0.18333334% reduce > copy (11 of 20 at
> > 0.02 MB/s) >
> >
> > Do you know what might be the problem.
> > Thanks,
> > Senthil
> >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

Reply via email to