Shi, The key here is the 99% done mapper. Nothing can move on until all mappers complete. Is it possible your data in the larger set has an incomplete record or some such at the end?
Kevin -----Original Message----- From: Shi Yu [mailto:sh...@uchicago.edu] Sent: Thursday, March 24, 2011 3:02 PM To: hadoop user Subject: Program freezes at Map 99% Reduce 33% I am running a hadoop program processing Tera Byte size data. The code was test successfully on a small sample (100G) and it worked. However, when trying it on the full problem, the program freezes forever at Map 99% Reduce 33%. There is no error, and the size of userlog folder is clean (<30M) cause otherwise it will generate Giga bytes of error logs. I checked the log of mapper and reducer, it seems that the reducer is waiting for an output from the mapper and it never reaches. What is the possible reason of causing this? Most of the configurations are set by default. I set "mapred.child.java.opts=-Xmx2000M hadoop.job.history.user.location=none". The problem occurs both on 0.19.2 and 0.20.2. Thanks! Example of Mapper logs: 2011-03-24 12:37:22,775 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted segments 2011-03-24 12:37:22,776 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 461743 bytes 2011-03-24 12:37:22,885 INFO org.apache.hadoop.mapred.MapTask: Index: (11015008, 10030254, 607594) 2011-03-24 12:37:22,889 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201103231501_0007_m_000286_0 is done. And is in the process of commiting 2011-03-24 12:37:22,897 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201103231501_0007_m_000286_0' done. Example of Reducer logs: 2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs 2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2011-03-24 13:51:18,544 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201103231501_0007_r_000018_0 Need another 1 map output(s) where 0 is already in progress 2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs 2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) Shi