RE: Datanode error
I am sorry, but I received an error when I sent the message to the list and all responses were sent to my junk mail. So I tried to send it again, and just then noticed your emails. Sorry!! -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: segunda-feira, 23 de julho de 2012 11:07 To: common-user@hadoop.apache.org Subject: Re: Datanode error Pablo, Perhaps you've forgotten about it but you'd ask the same question last week and you did have some responses on it. Please see your earlier thread at http://search-hadoop.com/m/0BOOh17ugmD On Mon, Jul 23, 2012 at 7:27 PM, Pablo Musa wrote: > Hey guys, > I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working. > However my datanodes keep having the same errors, over and over. > > I googled the problems and tried different flags (ex: > -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could > not solve it. > > Does anyone know what is the problem and how can I solve it? (the > stacktrace is at the end) > > I am running: > Java 1.7 > Hadoop 0.20.2 > Hbase 0.90.6 > Zoo 3.3.5 > > % top -> shows low load average (6% most of the time up to 60%), > already considering the number of cpus % vmstat -> shows no swap at > all % sar -> shows 75% idle cpu in the worst case > > Hope you guys can help me. > Thanks in advance, > Pablo > > 2012-07-20 00:03:44,455 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /DN01:50010, dest: > /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: > DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, > srvID: DS-798921853-DN01-50010-1328651609047, blockid: > blk_914960691839012728_14061688, duration: > 480061254006 > 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to > /DN01: > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav > a:175) > > 2012-07-20 00:03:44,455 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav > a:175) > > 2012-07-20 00:12:11,949 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_4602445008578088178_5707787 > 2012-07-20 00:12:11,962 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_-8916344806514717841_14081066 received exception > java.net.SocketTimeoutException: 63000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/DN01:36634 > remote=/DN03:50010] > 2012-07-20 00:12:11,962 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047,
Re: Datanode error
Pablo, Perhaps you've forgotten about it but you'd ask the same question last week and you did have some responses on it. Please see your earlier thread at http://search-hadoop.com/m/0BOOh17ugmD On Mon, Jul 23, 2012 at 7:27 PM, Pablo Musa wrote: > Hey guys, > I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working. > However my datanodes keep having the same errors, over and over. > > I googled the problems and tried different flags (ex: > -XX:MaxDirectMemorySize=2G) > and different configs (xceivers=8192) but could not solve it. > > Does anyone know what is the problem and how can I solve it? (the stacktrace > is at the end) > > I am running: > Java 1.7 > Hadoop 0.20.2 > Hbase 0.90.6 > Zoo 3.3.5 > > % top -> shows low load average (6% most of the time up to 60%), already > considering the number of cpus > % vmstat -> shows no swap at all > % sar -> shows 75% idle cpu in the worst case > > Hope you guys can help me. > Thanks in advance, > Pablo > > 2012-07-20 00:03:44,455 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /DN01:50010, dest: > /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: > DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, > srvID: DS-798921853-DN01-50010-1328651609047, blockid: > blk_914960691839012728_14061688, duration: > 480061254006 > 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to > /DN01: > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:03:44,455 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:12:11,949 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_4602445008578088178_5707787 > 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > writeBlock blk_-8916344806514717841_14081066 received exception > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DN01:36634 remote=/DN03:50010] > 2012-07-20 00:12:11,962 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DN01:36634 remote=/DN03:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.Da
RE: Datanode error
I am sorry, but I received an error when I sent the message to the list and all responses were sent to my junk mail. So I tried to send it again, and just then noticed your emails. >Please do also share if you're seeing an issue that you think is >related to these log messages. My datanodes do not have any big problem, but my regionservers are getting shutdown by timeout and I think it is related to the datanodes. I already tried a lot of different configurations but they keep "crashing". I asked in the hbase list, but we could not find anything (RSs seem healthy). We have 10 RSs and they get shutdown 7 times per day. So I thought maybe you guys could find what is wrong with my system. Thanks again, Pablo -Original Message- From: Raj Vishwanathan [mailto:rajv...@yahoo.com] Sent: sexta-feira, 20 de julho de 2012 14:38 To: common-user@hadoop.apache.org Subject: Re: Datanode error Could also be due to network issues. Number of sockets could be less or number of threads could be less. Raj > > From: Harsh J >To: common-user@hadoop.apache.org >Sent: Friday, July 20, 2012 9:06 AM >Subject: Re: Datanode error > >Pablo, > >These all seem to be timeouts from clients when they wish to read a >block and drops from clients when they try to write a block. I wouldn't >think of them as critical errors. Aside of being worried that a DN is >logging these, are you noticing any usability issue in your cluster? If >not, I'd simply blame this on stuff like speculative tasks, region >servers, general HDFS client misbehavior, etc. > >Please do also share if you're seeing an issue that you think is >related to these log messages. > >On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa wrote: >> Hey guys, >> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and >> working. >> However my datanodes keep having the same errors, over and over. >> >> I googled the problems and tried different flags (ex: >> -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could >> not solve it. >> >> Does anyone know what is the problem and how can I solve it? (the >> stacktrace is at the end) >> >> I am running: >> Java 1.7 >> Hadoop 0.20.2 >> Hbase 0.90.6 >> Zoo 3.3.5 >> >> % top -> shows low load average (6% most of the time up to 60%), >> already considering the number of cpus % vmstat -> shows no swap at >> all % sar -> shows 75% idle cpu in the worst case >> >> Hope you guys can help me. >> Thanks in advance, >> Pablo >> >> 2012-07-20 00:03:44,455 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /DN01:50010, dest: >> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: >> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, >> srvID: DS-798921853-DN01-50010-1328651609047, blockid: >> blk_914960691839012728_14061688, duration: >> 480061254006 >> 2012-07-20 00:03:44,455 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(DN01:50010, >> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 >> to /DN01: >> java.net.SocketTimeoutException: 48 millis timeout while waiting >>for channel to be ready for write. ch : >>java.nio.channels.SocketChannel[connected local=/DN01:50010 >>remote=/DN01:43516] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou >>t.java:246) >> at >>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS >>tream.java:159) >> at >>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS >>tream.java:198) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen >>der.java:397) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend >>er.java:493) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv >>er.java:279) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav >>a:175) >> >> 2012-07-20 00:03:44,455 ERROR >>org.apache.hadoop.hdfs.server.datanode.DataNode: >>DatanodeRegistration(DN01:50010, >>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >>ipcPort=50020):DataXceiver >> java.net.SocketTimeoutException: 48 millis timeout while waiting >>for channel to be
Datanode error
Hey guys, I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working. However my datanodes keep having the same errors, over and over. I googled the problems and tried different flags (ex: -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could not solve it. Does anyone know what is the problem and how can I solve it? (the stacktrace is at the end) I am running: Java 1.7 Hadoop 0.20.2 Hbase 0.90.6 Zoo 3.3.5 % top -> shows low load average (6% most of the time up to 60%), already considering the number of cpus % vmstat -> shows no swap at all % sar -> shows 75% idle cpu in the worst case Hope you guys can help me. Thanks in advance, Pablo 2012-07-20 00:03:44,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /DN01:50010, dest: /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, srvID: DS-798921853-DN01-50010-1328651609047, blockid: blk_914960691839012728_14061688, duration: 480061254006 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to /DN01: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) 2012-07-20 00:03:44,455 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) 2012-07-20 00:12:11,949 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_4602445008578088178_5707787 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8916344806514717841_14081066 received exception java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010] 2012-07-20 00:12:11,962 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.DataInputStream.readShort(DataInputStream.java:312) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:447) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183) 2012-07-20 00:12:20,670 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_7238561256016868237_3555939 2012-07-20 00:12:22,541 INFO org.apache.hadoop.hdfs.server.datanode.Dat
Re: Datanode error
Could also be due to network issues. Number of sockets could be less or number of threads could be less. Raj > > From: Harsh J >To: common-user@hadoop.apache.org >Sent: Friday, July 20, 2012 9:06 AM >Subject: Re: Datanode error > >Pablo, > >These all seem to be timeouts from clients when they wish to read a >block and drops from clients when they try to write a block. I >wouldn't think of them as critical errors. Aside of being worried that >a DN is logging these, are you noticing any usability issue in your >cluster? If not, I'd simply blame this on stuff like speculative >tasks, region servers, general HDFS client misbehavior, etc. > >Please do also share if you're seeing an issue that you think is >related to these log messages. > >On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa wrote: >> Hey guys, >> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and >> working. >> However my datanodes keep having the same errors, over and over. >> >> I googled the problems and tried different flags (ex: >> -XX:MaxDirectMemorySize=2G) >> and different configs (xceivers=8192) but could not solve it. >> >> Does anyone know what is the problem and how can I solve it? (the stacktrace >> is at the end) >> >> I am running: >> Java 1.7 >> Hadoop 0.20.2 >> Hbase 0.90.6 >> Zoo 3.3.5 >> >> % top -> shows low load average (6% most of the time up to 60%), already >> considering the number of cpus >> % vmstat -> shows no swap at all >> % sar -> shows 75% idle cpu in the worst case >> >> Hope you guys can help me. >> Thanks in advance, >> Pablo >> >> 2012-07-20 00:03:44,455 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /DN01:50010, dest: >> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: >> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, >> srvID: DS-798921853-DN01-50010-1328651609047, blockid: >> blk_914960691839012728_14061688, duration: >> 480061254006 >> 2012-07-20 00:03:44,455 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(DN01:50010, >> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 >> to /DN01: >> java.net.SocketTimeoutException: 48 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/DN01:50010 >> remote=/DN01:43516] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) >> >> 2012-07-20 00:03:44,455 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(DN01:50010, >> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, >> ipcPort=50020):DataXceiver >> java.net.SocketTimeoutException: 48 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/DN01:50010 >> remote=/DN01:43516] >> at >>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) >> at >>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) >> at >>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) >> >> 2012-07-20 00:12:11,949 INFO >> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Ve
Re: Datanode error
Pablo, These all seem to be timeouts from clients when they wish to read a block and drops from clients when they try to write a block. I wouldn't think of them as critical errors. Aside of being worried that a DN is logging these, are you noticing any usability issue in your cluster? If not, I'd simply blame this on stuff like speculative tasks, region servers, general HDFS client misbehavior, etc. Please do also share if you're seeing an issue that you think is related to these log messages. On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa wrote: > Hey guys, > I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working. > However my datanodes keep having the same errors, over and over. > > I googled the problems and tried different flags (ex: > -XX:MaxDirectMemorySize=2G) > and different configs (xceivers=8192) but could not solve it. > > Does anyone know what is the problem and how can I solve it? (the stacktrace > is at the end) > > I am running: > Java 1.7 > Hadoop 0.20.2 > Hbase 0.90.6 > Zoo 3.3.5 > > % top -> shows low load average (6% most of the time up to 60%), already > considering the number of cpus > % vmstat -> shows no swap at all > % sar -> shows 75% idle cpu in the worst case > > Hope you guys can help me. > Thanks in advance, > Pablo > > 2012-07-20 00:03:44,455 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /DN01:50010, dest: > /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: > DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, > srvID: DS-798921853-DN01-50010-1328651609047, blockid: > blk_914960691839012728_14061688, duration: > 480061254006 > 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to > /DN01: > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:03:44,455 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/DN01:50010 remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:12:11,949 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_4602445008578088178_5707787 > 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > writeBlock blk_-8916344806514717841_14081066 received exception > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DN01:36634 remote=/DN03:50010] > 2012-07-20 00:12:11,962 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DN01:36634 remote=/DN03:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.
Re: Datanode error
Hi Pablo, Are you sure that Hadoop 0.20.2 is supported on Java 1.7? (AFAIK it's Java 1.6) Thanks, Anil On Fri, Jul 20, 2012 at 6:07 AM, Pablo Musa wrote: > Hey guys, > I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and > working. > However my datanodes keep having the same errors, over and over. > > I googled the problems and tried different flags (ex: > -XX:MaxDirectMemorySize=2G) > and different configs (xceivers=8192) but could not solve it. > > Does anyone know what is the problem and how can I solve it? (the > stacktrace is at the end) > > I am running: > Java 1.7 > Hadoop 0.20.2 > Hbase 0.90.6 > Zoo 3.3.5 > > % top -> shows low load average (6% most of the time up to 60%), already > considering the number of cpus > % vmstat -> shows no swap at all > % sar -> shows 75% idle cpu in the worst case > > Hope you guys can help me. > Thanks in advance, > Pablo > > 2012-07-20 00:03:44,455 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /DN01:50010, dest: > /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: > DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, > srvID: DS-798921853-DN01-50010-1328651609047, blockid: > blk_914960691839012728_14061688, duration: > 480061254006 > 2012-07-20 00:03:44,455 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 > to /DN01: > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/DN01:50010 > remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:03:44,455 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/DN01:50010 > remote=/DN01:43516] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) > > 2012-07-20 00:12:11,949 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_4602445008578088178_5707787 > 2012-07-20 00:12:11,962 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_-8916344806514717841_14081066 received exception > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/DN01:36634 > remote=/DN03:50010] > 2012-07-20 00:12:11,962 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(DN01:50010, > storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/DN01:36634 > remote=/DN03:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.DataInputStream.readShort(DataInputStream.java:312) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock
Datanode error
Hey guys, I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working. However my datanodes keep having the same errors, over and over. I googled the problems and tried different flags (ex: -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could not solve it. Does anyone know what is the problem and how can I solve it? (the stacktrace is at the end) I am running: Java 1.7 Hadoop 0.20.2 Hbase 0.90.6 Zoo 3.3.5 % top -> shows low load average (6% most of the time up to 60%), already considering the number of cpus % vmstat -> shows no swap at all % sar -> shows 75% idle cpu in the worst case Hope you guys can help me. Thanks in advance, Pablo 2012-07-20 00:03:44,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /DN01:50010, dest: /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, srvID: DS-798921853-DN01-50010-1328651609047, blockid: blk_914960691839012728_14061688, duration: 480061254006 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to /DN01: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) 2012-07-20 00:03:44,455 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/DN01:50010 remote=/DN01:43516] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175) 2012-07-20 00:12:11,949 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_4602445008578088178_5707787 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8916344806514717841_14081066 received exception java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010] 2012-07-20 00:12:11,962 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/DN01:36634 remote=/DN03:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.DataInputStream.readShort(DataInputStream.java:312) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:447) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183) 2012-07-20 00:12:20,670 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_7238561256016868237_3555939 2012-07-20 00:12:22,541 INFO org.apache.hadoop.hdfs.server.datanode.Dat