Hi, Thank you. Before I changed your suggested parameters, I tried to run the job with 128 reducers instead of 64, and the job completed. I will wait to see if this fix is consistent before I change any of the mapred-site.xml since it is a production environment. I am still wondering what is the root cause of this Exception, The documentation for this Exception is hardly available.
Date: Sun, 23 Jan 2011 11:41:05 -0500 Subject: Re: Intermediate merge failed From: [email protected] To: [email protected] Try modifying some of these parameters.... ("io.sort.mb", "350");("io.sort.factor", "100");("io.file.buffer.size", "131072"); ("mapred.child.java.opts", "-Xms1024m -Xmx1024m");("mapred.reduce.parallel.copies", "8");("mapred.tasktracker.map.tasks.maximum", "12"); -Raja Thiruvathuru On Sun, Jan 23, 2011 at 9:26 AM, David Ginzburg <[email protected]> wrote: Hi, My cluster contains 22 DataNodes and Task Tracker each with 8 mapper slots and 4 reduce slots, each with 1.5G max heap size. I use cloudera CDH 2 I have a specific job that is constantly failing in the reduce phase. I use 64 reducers and 64M block size and compress map output with LZO. The same Exception which appears on all failed reduce tasks is : The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.IOException: Intermediate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2651) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576) Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635) ... 1 more Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.Text.readFields(Text.java:265) at com.conduit.UserLoginLog.Distinct2ActiveUsersKey.readFields(Distinct2ActiveUsersKey.java:114) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) ... 8 more I do expect a relatively large map output from this job. My mapred-site.xml contains <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>8</value> <final>false</final> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>4</value> <final>false</final> </property> <property> <name>mapred.reduce.tasks</name> <value>64</value> </property> <property> <name>mapred.job.reduce.input.buffer.percent</name> <value>0.9</value> </property> <property> <name>mapred.job.shuffle.merge.percent</name> <value>0.8</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1536m -Djava.library.path=/usr/lib/hadoop/lib/native -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> Can anyone speculate as to whats causing this? how can I at least make the job complete ? -- Raja Thiruvathuru
