Copyphase dies on timeout

Simon Willnauer Wed, 09 Dec 2009 10:31:41 -0800

Hi folks,

I do have a questions regarding a map-reduce job who gets killed
during the copy phase due to a timeout. I don't really understand the
output completely so I first look for some help to understand what is
going on.
There are about 50 mapper running where each mapper succeeds. the job
is configured with a single reduces (I assume this could be closely
related to that?!)
So this is the output:


2009-12-09 15:01:45,103 INFO org.apache.hadoop.mapred.Merger: Merging
5 sorted segments
2009-12-09 15:01:45,103 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 5 segments left of total size: 216092233
bytes
2009-12-09 15:01:50,727 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Merge of the 5 files in-memory
complete. Local file is
/data/hadoop-data/mapred/local/taskTracker/jobcache/job_200912081402_0042/attempt_200912081402_0042_r_000000_0/output/map_275.out
of size 216092229
2009-12-09 15:01:50,728 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Thread waiting: Thread for
merging on-disk files
2009-12-09 15:01:58,109 INFO org.apache.hadoop.mapred.ReduceTask: Read
53281055 bytes from map-output for
attempt_200912081402_0042_m_000253_0
2009-12-09 15:01:58,148 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_200912081402_0042_m_000253_0 -> (71, 103) from
hadoop32
2009-12-09 15:01:58,273 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Need another 44 map output(s)
where 3 is already in progress
2009-12-09 15:01:58,273 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts
and41 dup hosts)
2009-12-09 15:02:40,174 INFO org.apache.hadoop.mapred.ReduceTask: Read
81297732 bytes from map-output for
attempt_200912081402_0042_m_000112_0
2009-12-09 15:02:40,175 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_200912081402_0042_m_000112_0 -> (129, 153) from
hadoop31
2009-12-09 15:02:40,175 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Scheduled 1 outputs (0 slow hosts
and16 dup hosts)
2009-12-09 15:02:40,223 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 82364774 bytes (82364778 raw bytes) into RAM from
attempt_200912081402_0042_m_000121_0
2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask: Read
82722679 bytes from map-output for
attempt_200912081402_0042_m_000196_0
2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_200912081402_0042_m_000196_0 -> (114, 137) from
hadoop21
2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Scheduled 1 outputs (0 slow hosts
and32 dup hosts)
2009-12-09 15:02:45,143 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 82639206 bytes (82639210 raw bytes) into RAM from
attempt_200912081402_0042_m_000195_0
2009-12-09 15:02:58,274 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Need another 42 map output(s)
where 3 is already in progress
2009-12-09 15:02:58,274 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts
and39 dup hosts)
2009-12-09 15:03:58,275 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Need another 42 map output(s)
where 3 is already in progress
2009-12-09 15:03:58,275 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts
and39 dup hosts)
......

Task attempt_200912081402_0042_r_000000_0 failed to report status for
902 seconds. Killing!

my first guess would be some IO problem but I don not have any
indicator that those logs. If somebody can clarify or give me a hint
would be great. Even if it is something like: use more reducers
because of ... <-- this is the interesting part at the moment!
btw: this is hadoop-0.20

simon

Copyphase dies on timeout

Reply via email to