Re: Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException
On Wed, 20 Oct 2010 19:49:09 +0200 Erik Forsberg forsb...@opera.com wrote: Hi! I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and after the upgrade I'm getting the following error in the reducers during the copy phase in one of my larger jobs: 2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments 2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 12 segments left of total size: 382660295 bytes 2010-10-20 17:43:22,517 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201010201640_0001_r_00_0 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128) What does a EOFException in this code actually mean? Is it hiding some other error that could tell me more about what's wrong? I'm seeing quite a few of these in my datanode logs: 2010-10-21 10:21:01,149 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.20.11.66:50010, storageID=DS-71308762-10.20.11.66-50010-126995760, infoPort= 50075, ipcPort=50020):Got exception while serving blk_1081044479123523815_4852013 to /10.20.11.88: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010 remote =/10.20.11.88:41347] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) Could that be related somehow? I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5 says they are harmless. *) Running with and without compressed map output, no difference. *) With -Xmx512m and -Xmx768m, no difference. *) Decreasing number of mappers and reducers on all nodes to decrease overall load. *) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default) Also tried doubling the number of reducers to get each reducer to process less data, but that didn't help either :-( \EF -- Erik Forsberg forsb...@opera.com Developer, Opera Software - http://www.opera.com/
Re: Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException
On Thu, 21 Oct 2010 12:44:24 +0200 Erik Forsberg forsb...@opera.com wrote: attempt_201010201640_0001_r_00_0 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128) In addition, some reducers first fail with: 010-10-21 10:31:24,696 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010201640_0186_m_000579_0, compressed len: 2281015, decompressed len: 10368075 2010-10-21 10:31:24,696 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 10368075 bytes (2281015 raw bytes) into RAM from attempt_201010201640_0186_m_000579_0 2010-10-21 10:31:24,744 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2010-10-21 10:31:24,854 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 582879560 bytes 2010-10-21 10:31:27,396 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201010201640_0186_r_27_2 : Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:119) at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:98) at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576) 2010-10-21 10:31:27,397 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201010201640_0186_r_27_2 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2529) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:154) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) ... 7 more at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2533) 2010-10-21 10:31:27,714 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2010-10-21 10:31:27,717 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2010-10-21 10:31:27,727 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager Then, on second try, they fail with the java.io.EOFException above. \EF -- Erik Forsberg forsb...@opera.com Developer, Opera Software - http://www.opera.com/
Re: Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException
544(5(56 一 �奈业� iPod �魉统� �M 在 2010/10/21 下午6:44 �r,ErikForsberg forsb...@opera.com 到: 艹 On Wed, 20 Oct 2010 19:49:09 +0200 Erik Forsberg forsb...@opera.com wrote: Hi! I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and after the upgrade I'm getting the following error in the reducers during the copy phase in one of my larger jobs: 2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 12 segments... 2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments 2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 12 segments left of total size: 382660295 bytes 2010-10-20 17:43:22,517 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201010201640_0001_r_00_0 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare (WritableComparator.java:128) What does a EOFException in this code actually mean? Is it hiding some other error that could tell me more about what's wrong? I'm seeing quite a few of these in my datanode logs: 2010-10-21 10:21:01,149 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.20.11.66:50010, storageID=DS-71308762-10.20.11.66-50010-126995760, infoPort= 50075, ipcPort=50020):Got exception while serving blk_1081044479123523815_4852013 to /10.20.11.88: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010 remote =/10.20.11.88:41347] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO (SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable (SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully (SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks (BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock (BlockSender.java:401) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock (DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run (DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) Could that be related somehow? I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5 says they are harmless. *) Running with and without compressed map output, no difference. *) With -Xmx512m and -Xmx768m, no difference. *) Decreasing number of mappers and reducers on all nodes to decrease overall load. *) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default) Also tried doubling the number of reducers to get each reducer to process less data, but that didn't help either :-( \EF -- Erik Forsberg forsb...@opera.com Developer, Opera Software - http://www.opera.com/