Re: Reduce Hangs

朱盛凯 Thu, 27 Mar 2008 09:32:20 -0700

Hi,

I met this problem in my cluster before, I think I can share with you some
of my experience.
But it may not work in you case.

The job in my cluster always hung at 16% of reduce. It occured because the
reduce task could not fetch the
map output from other nodes.

In my case, two factors may result in this faliure of communication between
two task trackers.

One is the firewall block the trackers from communications. I solved this by
disabling the firewall.
The other factor is that trackers refer to other nodes by host name only,
but not ip address. I solved this by editing the file /etc/hosts
with mapping from hostname to ip address of all nodes in cluster.

I hope my experience will be helpful for you.

On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I have small Hadoop cluster, one master and three slaves.
> When I try the example wordcount on one of our log file (size ~350 MB)
>
> Map runs fine but reduce always hangs (sometime around 19%,60% ...) after
> very long time it finishes.
> I am seeing this error
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> In the log I am seeing this
> INFO org.apache.hadoop.mapred.TaskTracker:
> task_200803261535_0001_r_000000_0 0.18333334% reduce > copy (11 of 20 at
> 0.02 MB/s) >
>
> Do you know what might be the problem.
> Thanks,
> Senthil
>
>

Re: Reduce Hangs

Reply via email to