Shi,

The key here is the 99% done mapper. Nothing can move on until all
mappers complete.
Is it possible your data in the larger set has an incomplete record or
some such at the end?

Kevin

-----Original Message-----
From: Shi Yu [mailto:sh...@uchicago.edu] 
Sent: Thursday, March 24, 2011 3:02 PM
To: hadoop user
Subject: Program freezes at Map 99% Reduce 33%

I am running a hadoop program processing Tera Byte size data. The code
was test successfully on a small sample (100G) and it worked. However,
when trying it on the full problem, the program freezes forever at Map
99% Reduce 33%. There is no error, and the size of userlog folder is
clean (<30M) cause otherwise it will generate Giga bytes of error logs.

I checked the log of mapper and reducer, it seems that the reducer is
waiting for an output from the mapper and it never reaches. What is the
possible reason of causing this? Most of the configurations are set by
default. I set "mapred.child.java.opts=-Xmx2000M
hadoop.job.history.user.location=none". The problem occurs both on
0.19.2 and 0.20.2. Thanks!

Example of Mapper logs:

2011-03-24 12:37:22,775 INFO org.apache.hadoop.mapred.Merger: Merging 3
sorted segments
2011-03-24 12:37:22,776 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 3 segments left of total size: 461743 bytes
2011-03-24 12:37:22,885 INFO org.apache.hadoop.mapred.MapTask: Index: 
(11015008, 10030254, 607594)
2011-03-24 12:37:22,889 INFO org.apache.hadoop.mapred.TaskRunner: 
Task:attempt_201103231501_0007_m_000286_0 is done. And is in the process
of commiting
2011-03-24 12:37:22,897 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201103231501_0007_m_000286_0' done.

Example of Reducer logs:

2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs
2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts
and0 dup hosts)
2011-03-24 13:51:18,544 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201103231501_0007_r_000018_0 Need another 1 map output(s) where
0 is already in progress
2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs
2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts
and0 dup hosts)


Shi

Reply via email to