I am sorry, but I received an error when I sent the message to the list and all responses were sent to my junk mail. So I tried to send it again, and just then noticed your emails.
>Please do also share if you're seeing an issue that you think is >related to these log messages. My datanodes do not have any big problem, but my regionservers are getting shutdown by timeout and I think it is related to the datanodes. I already tried a lot of different configurations but they keep "crashing". I asked in the hbase list, but we could not find anything (RSs seem healthy). We have 10 RSs and they get shutdown 7 times per day. So I thought maybe you guys could find what is wrong with my system. Thanks again, Pablo -----Original Message----- From: Raj Vishwanathan [mailto:rajv...@yahoo.com] Sent: sexta-feira, 20 de julho de 2012 14:38 To: common-user@hadoop.apache.org Subject: Re: Datanode error Could also be due to network issues. Number of sockets could be less or number of threads could be less. Raj >________________________________ > From: Harsh J <ha...@cloudera.com> >To: common-user@hadoop.apache.org >Sent: Friday, July 20, 2012 9:06 AM >Subject: Re: Datanode error > >Pablo, > >These all seem to be timeouts from clients when they wish to read a >block and drops from clients when they try to write a block. I wouldn't >think of them as critical errors. Aside of being worried that a DN is >logging these, are you noticing any usability issue in your cluster? If >not, I'd simply blame this on stuff like speculative tasks, region >servers, general HDFS client misbehavior, etc. > >Please do also share if you're seeing an issue that you think is >related to these log messages. > >On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa <pa...@psafe.com> wrote: >> Hey guys, >> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and >> working. >> However my datanodes keep having the same errors, over and over. >> >> I googled the problems and tried different flags (ex: >> -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could >> not solve it. >> >> Does anyone know what is the problem and how can I solve it? (the >> stacktrace is at the end) >> >> I am running: >> Java 1.7 >> Hadoop 0.20.2 >> Hbase 0.90.6 >> Zoo 3.3.5 >> >> % top -> shows low load average (6% most of the time up to 60%), >> already considering the number of cpus % vmstat -> shows no swap at >> all % sar -> shows 75% idle cpu in the worst case >> >> Hope you guys can help me. >> Thanks in advance, >> Pablo >> >> 2012-07-20 00:03:44,455 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /DN01:50010, dest: >> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: >> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, >> srvID: DS-798921853-DN01-50010-1328651609047, blockid: >> blk_914960691839012728_14061688, duration: >> 480061254006 >> 2012-07-20 00:03:44,455 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(DN01:50010, >> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 >> to /DN01: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting >>for channel to be ready for write. ch : >>java.nio.channels.SocketChannel[connected local=/DN01:50010 >>remote=/DN01:43516] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou >>t.java:246) >> at >>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS >>tream.java:159) >> at >>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS >>tream.java:198) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen >>der.java:397) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend >>er.java:493) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv >>er.java:279) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:175) >> >> 2012-07-20 00:03:44,455 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.net.SocketTimeoutException: 480000 millis timeout while waiting >>for channel to be ready for write. ch : >>java.nio.channels.SocketChannel[connected local=/DN01:50010 >>remote=/DN01:43516] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou >>t.java:246) >> at >>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS >>tream.java:159) >> at >>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS >>tream.java:198) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen >>der.java:397) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend >>er.java:493) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv >>er.java:279) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:175) >> >> 2012-07-20 00:12:11,949 INFO >>org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification >>succeeded for blk_4602445008578088178_5707787 >> 2012-07-20 00:12:11,962 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>blk_-8916344806514717841_14081066 received exception >>java.net.SocketTimeoutException: 63000 millis timeout while waiting >>for channel to be ready for read. ch : >>java.nio.channels.SocketChannel[connected local=/DN01:36634 >>remote=/DN03:50010] >> 2012-07-20 00:12:11,962 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.net.SocketTimeoutException: 63000 millis timeout while waiting >>for channel to be ready for read. ch : >>java.nio.channels.SocketChannel[connected local=/DN01:36634 >>remote=/DN03:50010] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.jav >>a:164) >> at >>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:15 >>5) >> at >>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:12 >>8) >> at >>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:11 >>6) >> at java.io.FilterInputStream.read(FilterInputStream.java:83) >> at >>java.io.DataInputStream.readShort(DataInputStream.java:312) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei >>ver.java:447) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:183) >> >> >> 2012-07-20 00:12:20,670 INFO >>org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification >>succeeded for blk_7238561256016868237_3555939 >> 2012-07-20 00:12:22,541 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block >>blk_-7028120671250332363_14081073 src: /DN03:50331 dest: /DN01:50010 >> 2012-07-20 00:12:22,544 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in >>receiveBlock for block blk_-7028120671250332363_14081073 >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:22,544 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block >> blk_-7028120671250332363_14081073 Interrupted. >> 2012-07-20 00:12:22,544 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for >>block blk_-7028120671250332363_14081073 terminating >> 2012-07-20 00:12:22,544 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>blk_-7028120671250332363_14081073 received exception >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:22,544 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.io.EOFException: while trying to read 65557 bytes >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe >>ceiver.java:290) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl >>ockReceiver.java:334) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo >>ckReceiver.java:398) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc >>kReceiver.java:577) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei >>ver.java:494) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:183) >> >> >> 2012-07-20 00:12:34,266 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block >>blk_-1834839455324747507_14081046 src: /DN05:59897 dest: /DN01:50010 >> 2012-07-20 00:12:34,267 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in >>receiveBlock for block blk_-1834839455324747507_14081046 >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:34,268 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block >> blk_-1834839455324747507_14081046 Interrupted. >> 2012-07-20 00:12:34,268 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for >>block blk_-1834839455324747507_14081046 terminating >> 2012-07-20 00:12:34,268 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>blk_-1834839455324747507_14081046 received exception >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:34,268 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.io.EOFException: while trying to read 65557 bytes >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe >>ceiver.java:290) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl >>ockReceiver.java:334) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo >>ckReceiver.java:398) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc >>kReceiver.java:577) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei >>ver.java:494) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:183) >> 2012-07-20 00:12:34,269 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block >>blk_3941134611454287401_14080990 src: /DN03:50345 dest: /DN01:50010 >> 2012-07-20 00:12:34,270 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in >>receiveBlock for block blk_3941134611454287401_14080990 >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:34,270 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block >> blk_3941134611454287401_14080990 Interrupted. >> 2012-07-20 00:12:34,271 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for >>block blk_3941134611454287401_14080990 terminating >> 2012-07-20 00:12:34,271 INFO >>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>blk_3941134611454287401_14080990 received exception >>java.io.EOFException: while trying to read 65557 bytes >> 2012-07-20 00:12:34,271 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.io.EOFException: while trying to read 65557 bytes >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe >>ceiver.java:290) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl >>ockReceiver.java:334) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo >>ckReceiver.java:398) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc >>kReceiver.java:577) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei >>ver.java:494) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:183) > > > >-- >Harsh J > > >