Re: MapReduce jobs hanging or failing near completion

Arun C Murthy Tue, 19 Jul 2011 14:01:41 -0700

Is this reproducible? If so, I'd urge you to check your local disks...

Arun


On Jul 19, 2011, at 12:41 PM, Kai Ju Liu wrote:

> Hi Marcos. The issue appears to be the following. A reduce task is unable to 
> fetch results from a map task on HDFS. The map task is re-run, but the map 
> task is now unable to retrieve information that it needs to run. Here is the 
> error from the second map task:
> java.io.FileNotFoundException: 
> /mnt/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201107171642_0560/attempt_201107171642_0560_m_000292_1/output/spill0.out
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
>       at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205)
>       at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
>       at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418)
>       at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
>       at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1547)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1179)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> I have been having general difficulties with HDFS on EBS, which pointed me in 
> this direction. Does this sound like a possible hypothesis to you? Thanks!
> 
> 
> Kai Ju
> 
> P.S. I am migrating off of HDFS on EBS, so I will post back with further 
> results as soon as I have them.
> On Thu, Jul 7, 2011 at 6:36 PM, Marcos Ortiz <mlor...@uci.cu> wrote:
> 
> 
> El 7/7/2011 8:43 PM, Kai Ju Liu escribió:
> 
> Over the past week or two, I've run into an issue where MapReduce jobs
> hang or fail near completion. The percent completion of both map and
> reduce tasks is often reported as 100%, but the actual number of
> completed tasks is less than the total number. It appears that either
> tasks backtrack and need to be restarted or the last few reduce tasks
> hang interminably on the copy step.
> 
> In certain cases, the jobs actually complete. In other cases, I can't
> wait long enough and have to kill the job manually.
> 
> My Hadoop cluster is hosted in EC2 on instances of type c1.xlarge with 4
> attached EBS volumes. The instances run Ubuntu 10.04.1 with the
> 2.6.32-309-ec2 kernel, and I'm currently using Cloudera's CDH3u0
> distribution. Has anyone experienced similar behavior in their clusters,
> and if so, had any luck resolving it? Thanks!
> 
> Can you post here your NN and DN logs files?
> Regards
> 
> Kai Ju
> 
> -- 
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (UCI)
>  Linux User # 418229
>  http://marcosluis2186.posterous.com
>  http://twitter.com/marcosluis2186
>

Re: MapReduce jobs hanging or failing near completion

Reply via email to