Re: Program freezes at Map 99% Reduce 33%

Shi Yu Thu, 24 Mar 2011 12:54:51 -0700

Hi Kevin,

thanks for reply. I could hardly imagine an example of incompleterecord. The mapper is very simple, just reading line by line as Strings,splitting the line by tab, and outputting a Text Pair for sort andsecondary sort. If there were incomplete record, there should be anerror and the only place could happen is the tab splitting stage. Iuse LZO codec compressing the Mapper output and Reducer output.

There is one issue which I think might be the reason. The files in thelog has the following structure . It seems I had 294 mappers. Noticethat there is one file missing: "attempt_201103231501_0007_m_000292_0",where is the Nr. 292 mapper output ???? Was it a failed node or something?

All the logs just hang there for 2 hours (~12:35, while the current timeis 14:50). Some folders were visited around 14:45 by me. So no reducernor mapper has been generating any logs in the past two hours.

Shi

drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:36attempt_201103231501_0007_m_000281_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:37attempt_201103231501_0007_m_000282_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:37attempt_201103231501_0007_m_000283_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:35attempt_201103231501_0007_m_000284_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:37attempt_201103231501_0007_m_000285_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 13:59attempt_201103231501_0007_m_000286_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:37attempt_201103231501_0007_m_000287_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 14:44attempt_201103231501_0007_m_000288_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:37attempt_201103231501_0007_m_000289_0drwxr-xr-x 2 sheeyu users 85 2011-03-24 12:37attempt_201103231501_0007_m_000289_1drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:22attempt_201103231501_0007_m_000290_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 14:45attempt_201103231501_0007_m_000291_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_m_000293_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000000_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000001_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000002_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000003_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000004_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000005_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000006_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000007_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 14:47attempt_201103231501_0007_r_000008_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000009_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000010_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000011_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000012_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000013_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000014_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000015_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000016_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 12:07attempt_201103231501_0007_r_000017_0drwxr-xr-x 2 sheeyu users 61 2011-03-24 14:42attempt_201103231501_0007_r_000018_0


Shi

On 3/24/2011 2:25 PM, kevin.le...@thomsonreuters.com wrote:

Shi,

The key here is the 99% done mapper. Nothing can move on until all
mappers complete.
Is it possible your data in the larger set has an incomplete record or
some such at the end?

Kevin

-----Original Message-----
From: Shi Yu [mailto:sh...@uchicago.edu]
Sent: Thursday, March 24, 2011 3:02 PM
To: hadoop user
Subject: Program freezes at Map 99% Reduce 33%

I am running a hadoop program processing Tera Byte size data. The code
was test successfully on a small sample (100G) and it worked. However,
when trying it on the full problem, the program freezes forever at Map
99% Reduce 33%. There is no error, and the size of userlog folder is
clean (<30M) cause otherwise it will generate Giga bytes of error logs.

I checked the log of mapper and reducer, it seems that the reducer is
waiting for an output from the mapper and it never reaches. What is the
possible reason of causing this? Most of the configurations are set by
default. I set "mapred.child.java.opts=-Xmx2000M
hadoop.job.history.user.location=none". The problem occurs both on
0.19.2 and 0.20.2. Thanks!

Example of Mapper logs:

2011-03-24 12:37:22,775 INFO org.apache.hadoop.mapred.Merger: Merging 3
sorted segments
2011-03-24 12:37:22,776 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 3 segments left of total size: 461743 bytes
2011-03-24 12:37:22,885 INFO org.apache.hadoop.mapred.MapTask: Index:
(11015008, 10030254, 607594)
2011-03-24 12:37:22,889 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201103231501_0007_m_000286_0 is done. And is in the process
of commiting
2011-03-24 12:37:22,897 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201103231501_0007_m_000286_0' done.

Example of Reducer logs:

2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs
2011-03-24 13:50:18,484 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts
and0 dup hosts)
2011-03-24 13:51:18,544 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201103231501_0007_r_000018_0 Need another 1 map output(s) where
0 is already in progress
2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201103231501_0007_r_000018_0: Got 0 new map-outputs
2011-03-24 13:51:18,545 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201103231501_0007_r_000018_0 Scheduled 0 outputs (0 slow hosts
and0 dup hosts)


Shi

Re: Program freezes at Map 99% Reduce 33%

Reply via email to