Hi, Thanks for your suggestions. It looks like the problem is with firewall, I created the firewall rule to allow these ports 50000 to 50100 (I found in these port range hadoop was listening)
Looks like I am missing some ports and that gets blocked in the firewall. Could anyone please let me know, how to configure hadoop to use only certain specified ports, so that those ports can be allowed in the firewall. Thanks, Senthil -----Original Message----- From: 朱盛凯 [mailto:[EMAIL PROTECTED] Sent: Thursday, March 27, 2008 12:32 PM To: core-user@hadoop.apache.org Subject: Re: Reduce Hangs Hi, I met this problem in my cluster before, I think I can share with you some of my experience. But it may not work in you case. The job in my cluster always hung at 16% of reduce. It occured because the reduce task could not fetch the map output from other nodes. In my case, two factors may result in this faliure of communication between two task trackers. One is the firewall block the trackers from communications. I solved this by disabling the firewall. The other factor is that trackers refer to other nodes by host name only, but not ip address. I solved this by editing the file /etc/hosts with mapping from hostname to ip address of all nodes in cluster. I hope my experience will be helpful for you. On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote: > > Hi, > I have small Hadoop cluster, one master and three slaves. > When I try the example wordcount on one of our log file (size ~350 MB) > > Map runs fine but reduce always hangs (sometime around 19%,60% ...) after > very long time it finishes. > I am seeing this error > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out > In the log I am seeing this > INFO org.apache.hadoop.mapred.TaskTracker: > task_200803261535_0001_r_000000_0 0.18333334% reduce > copy (11 of 20 at > 0.02 MB/s) > > > Do you know what might be the problem. > Thanks, > Senthil > >