Hi,
Thanks for your suggestions.

It looks like the problem is with firewall, I created the firewall rule to 
allow these ports 50000 to 50100 (I found in these port range hadoop was 
listening)

Looks like I am missing some ports and that gets blocked in the firewall.

Could anyone please let me know, how to configure hadoop to use only certain 
specified ports, so that those ports can be allowed in the firewall.

Thanks,
Senthil

-----Original Message-----
From: 朱盛凯 [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 27, 2008 12:32 PM
To: core-user@hadoop.apache.org
Subject: Re: Reduce Hangs

Hi,

I met this problem in my cluster before, I think I can share with you some
of my experience.
But it may not work in you case.

The job in my cluster always hung at 16% of reduce. It occured because the
reduce task could not fetch the
map output from other nodes.

In my case, two factors may result in this faliure of communication between
two task trackers.

One is the firewall block the trackers from communications. I solved this by
disabling the firewall.
The other factor is that trackers refer to other nodes by host name only,
but not ip address. I solved this by editing the file /etc/hosts
with mapping from hostname to ip address of all nodes in cluster.

I hope my experience will be helpful for you.

On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I have small Hadoop cluster, one master and three slaves.
> When I try the example wordcount on one of our log file (size ~350 MB)
>
> Map runs fine but reduce always hangs (sometime around 19%,60% ...) after
> very long time it finishes.
> I am seeing this error
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> In the log I am seeing this
> INFO org.apache.hadoop.mapred.TaskTracker:
> task_200803261535_0001_r_000000_0 0.18333334% reduce > copy (11 of 20 at
> 0.02 MB/s) >
>
> Do you know what might be the problem.
> Thanks,
> Senthil
>
>

Reply via email to