> >> 2009-12-09 15:03:58,275 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts > >> and39 dup hosts)
- Reduce was started - you are probably using kind of web crawler (39 duplicate hosts, 0 slow hosts) > >> Task attempt_200912081402_0042_r_000000_0 failed to report status for > >> 902 seconds. Killing! - Try to increase <name>mapred.task.timeout</name> 10 times; you are currently using 900 seconds. Another use case which I had before (with Reducer): TIKA hangs forever on some weird HTML. And another use case (BIXO): HttpClient configured with 20 attempts, 30 seconds timeout, which makes 20*30 of "non reporting delay" so that default (mapred.task.timeout) 600 seconds setting is not good. Fuad Efendi +1 416-993-2060 http://www.tokenizer.ca Data Mining, Vertical Search > -----Original Message----- > From: Simon Willnauer [mailto:[email protected]] > Sent: December-09-09 1:46 PM > To: Fuad Efendi > Cc: [email protected] > Subject: Re: Copyphase dies on timeout > > Hmm it looks like the reducer hasn't even started. and I do not have > an external resource at all. > > simon > > On Wed, Dec 9, 2009 at 7:39 PM, Fuad Efendi <[email protected]> wrote: > > I had similar issues before; Reduce can do some 'additional' processing > and have dependency on external services. > > > > For instance, in my case it sends documents to SOLR (Lucene based). So > that I have dependency on Lucene in a reducer. And, Lucene sometimes needs > 30-50 minutes to perform index merge. If your timeout settings are not > enough, job will be killed. You may also execute reporter.keepAlive() each > 60 seconds (in a separate thread spawned by your reducer). > > > > See at <name>mapred.task.timeout</name>, and use reporter.progress(), > keepAlive(), etc. > > > > > > Fuad Efendi > > +1 416-993-2060 > > http://www.tokenizer.ca > > Data Mining, Vertical Search > > > > > >> -----Original Message----- > >> From: Simon Willnauer [mailto:[email protected]] > >> Sent: December-09-09 1:31 PM > >> To: [email protected] > >> Subject: Copyphase dies on timeout > >> > >> Hi folks, > >> > >> I do have a questions regarding a map-reduce job who gets killed > >> during the copy phase due to a timeout. I don't really understand the > >> output completely so I first look for some help to understand what is > >> going on. > >> There are about 50 mapper running where each mapper succeeds. the job > >> is configured with a single reduces (I assume this could be closely > >> related to that?!) > >> So this is the output: > >> > >> 2009-12-09 15:01:45,103 INFO org.apache.hadoop.mapred.Merger: Merging > >> 5 sorted segments > >> 2009-12-09 15:01:45,103 INFO org.apache.hadoop.mapred.Merger: Down to > >> the last merge-pass, with 5 segments left of total size: 216092233 > >> bytes > >> 2009-12-09 15:01:50,727 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Merge of the 5 files in-memory > >> complete. Local file is > >> /data/hadoop- > >> > data/mapred/local/taskTracker/jobcache/job_200912081402_0042/attempt_20091 > >> 2081402_0042_r_000000_0/output/map_275.out > >> of size 216092229 > >> 2009-12-09 15:01:50,728 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Thread waiting: Thread for > >> merging on-disk files > >> 2009-12-09 15:01:58,109 INFO org.apache.hadoop.mapred.ReduceTask: Read > >> 53281055 bytes from map-output for > >> attempt_200912081402_0042_m_000253_0 > >> 2009-12-09 15:01:58,148 INFO org.apache.hadoop.mapred.ReduceTask: Rec > >> #1 from attempt_200912081402_0042_m_000253_0 -> (71, 103) from > >> hadoop32 > >> 2009-12-09 15:01:58,273 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Need another 44 map output(s) > >> where 3 is already in progress > >> 2009-12-09 15:01:58,273 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts > >> and41 dup hosts) > >> 2009-12-09 15:02:40,174 INFO org.apache.hadoop.mapred.ReduceTask: Read > >> 81297732 bytes from map-output for > >> attempt_200912081402_0042_m_000112_0 > >> 2009-12-09 15:02:40,175 INFO org.apache.hadoop.mapred.ReduceTask: Rec > >> #1 from attempt_200912081402_0042_m_000112_0 -> (129, 153) from > >> hadoop31 > >> 2009-12-09 15:02:40,175 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 1 outputs (0 slow hosts > >> and16 dup hosts) > >> 2009-12-09 15:02:40,223 INFO org.apache.hadoop.mapred.ReduceTask: > >> Shuffling 82364774 bytes (82364778 raw bytes) into RAM from > >> attempt_200912081402_0042_m_000121_0 > >> 2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask: Read > >> 82722679 bytes from map-output for > >> attempt_200912081402_0042_m_000196_0 > >> 2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask: Rec > >> #1 from attempt_200912081402_0042_m_000196_0 -> (114, 137) from > >> hadoop21 > >> 2009-12-09 15:02:45,044 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 1 outputs (0 slow hosts > >> and32 dup hosts) > >> 2009-12-09 15:02:45,143 INFO org.apache.hadoop.mapred.ReduceTask: > >> Shuffling 82639206 bytes (82639210 raw bytes) into RAM from > >> attempt_200912081402_0042_m_000195_0 > >> 2009-12-09 15:02:58,274 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Need another 42 map output(s) > >> where 3 is already in progress > >> 2009-12-09 15:02:58,274 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts > >> and39 dup hosts) > >> 2009-12-09 15:03:58,275 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Need another 42 map output(s) > >> where 3 is already in progress > >> 2009-12-09 15:03:58,275 INFO org.apache.hadoop.mapred.ReduceTask: > >> attempt_200912081402_0042_r_000000_0 Scheduled 0 outputs (0 slow hosts > >> and39 dup hosts) > >> ...... > >> > >> Task attempt_200912081402_0042_r_000000_0 failed to report status for > >> 902 seconds. Killing! > >> > >> my first guess would be some IO problem but I don not have any > >> indicator that those logs. If somebody can clarify or give me a hint > >> would be great. Even if it is something like: use more reducers > >> because of ... <-- this is the interesting part at the moment! > >> btw: this is hadoop-0.20 > >> > >> simon > > > > > >
