RE: Datanode error

2012-07-23 Thread Pablo Musa
I am sorry, but I received an error when I sent the message to the list and all 
responses were
sent to my junk mail. So I tried to send it again, and just then noticed your 
emails.

Please do also share if you're seeing an issue that you think is 
related to these log messages.

My datanodes do not have any big problem, but my regionservers are getting 
shutdown by
timeout and I think it is related to the datanodes. I already tried a lot of 
different configurations
but they keep crashing. I asked in the hbase list, but we could not find 
anything (RSs seem
healthy). We have 10 RSs and they get shutdown 7 times per day.

So I thought maybe you guys could find what is wrong with my system.

Thanks again,
Pablo
 
-Original Message-
From: Raj Vishwanathan [mailto:rajv...@yahoo.com] 
Sent: sexta-feira, 20 de julho de 2012 14:38
To: common-user@hadoop.apache.org
Subject: Re: Datanode error

Could also be due to network issues. Number of sockets could be less or number 
of threads could be less.

Raj




 From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Friday, July 20, 2012 9:06 AM
Subject: Re: Datanode error
 
Pablo,

These all seem to be timeouts from clients when they wish to read a 
block and drops from clients when they try to write a block. I wouldn't 
think of them as critical errors. Aside of being worried that a DN is 
logging these, are you noticing any usability issue in your cluster? If 
not, I'd simply blame this on stuff like speculative tasks, region 
servers, general HDFS client misbehavior, etc.

Please do also share if you're seeing an issue that you think is 
related to these log messages.

On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa pa...@psafe.com wrote:
 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and 
 working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex: 
 -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could 
 not solve it.

 Does anyone know what is the problem and how can I solve it? (the 
 stacktrace is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), 
 already considering the number of cpus % vmstat - shows no swap at 
 all % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
 srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 
 to /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting 
for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/DN01:50010 
remote=/DN01:43516]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou
t.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS
tream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS
tream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen
der.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend
er.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv
er.java:279)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
a:175)

 2012-07-20 00:03:44,455 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(DN01:50010, 
storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting 
for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/DN01:50010 
remote=/DN01:43516]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou
t.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS
tream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS
tream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen
der.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend
er.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv
er.java:279

Re: Datanode error

2012-07-23 Thread Harsh J
Pablo,

Perhaps you've forgotten about it but you'd ask the same question last
week and you did have some responses on it. Please see your earlier
thread at http://search-hadoop.com/m/0BOOh17ugmD

On Mon, Jul 23, 2012 at 7:27 PM, Pablo Musa pa...@psafe.com wrote:
 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex: 
 -XX:MaxDirectMemorySize=2G)
 and different configs (xceivers=8192) but could not solve it.

 Does anyone know what is the problem and how can I solve it? (the stacktrace 
 is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), already 
 considering the number of cpus
 % vmstat - shows no swap at all
 % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
 srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to 
 /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:03:44,455 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:12:11,949 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
 succeeded for blk_4602445008578088178_5707787
 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 writeBlock blk_-8916344806514717841_14081066 received exception 
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 2012-07-20 00:12:11,962 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at java.io.DataInputStream.readShort(DataInputStream.java:312)
 at 
 

RE: Datanode error

2012-07-23 Thread Pablo Musa
I am sorry, but I received an error when I sent the message to the list and all 
responses were sent to my junk mail. 
So I tried to send it again, and just then noticed your emails.

Sorry!!

-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: segunda-feira, 23 de julho de 2012 11:07
To: common-user@hadoop.apache.org
Subject: Re: Datanode error

Pablo,

Perhaps you've forgotten about it but you'd ask the same question last week and 
you did have some responses on it. Please see your earlier thread at 
http://search-hadoop.com/m/0BOOh17ugmD

On Mon, Jul 23, 2012 at 7:27 PM, Pablo Musa pa...@psafe.com wrote:
 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex: 
 -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could 
 not solve it.

 Does anyone know what is the problem and how can I solve it? (the 
 stacktrace is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), 
 already considering the number of cpus % vmstat - shows no swap at 
 all % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
 srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to 
 /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
 a:175)

 2012-07-20 00:03:44,455 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
 a:175)

 2012-07-20 00:12:11,949 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
 succeeded for blk_4602445008578088178_5707787
 2012-07-20 00:12:11,962 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
 blk_-8916344806514717841_14081066 received exception 
 java.net.SocketTimeoutException: 63000 millis timeout while waiting 
 for channel to be ready for read. ch : 
 java.nio.channels.SocketChannel[connected local=/DN01:36634 
 remote=/DN03:50010]
 2012-07-20 00:12:11,962 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at 
 org.apache.hadoop.net.SocketInputStream.read

Re: Datanode error

2012-07-20 Thread anil gupta
Hi Pablo,

Are you sure that Hadoop 0.20.2 is supported on Java 1.7? (AFAIK it's Java
1.6)

Thanks,
Anil

On Fri, Jul 20, 2012 at 6:07 AM, Pablo Musa pa...@psafe.com wrote:

 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and
 working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex:
 -XX:MaxDirectMemorySize=2G)
 and different configs (xceivers=8192) but could not solve it.

 Does anyone know what is the problem and how can I solve it? (the
 stacktrace is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), already
 considering the number of cpus
 % vmstat - shows no swap at all
 % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID:
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544,
 srvID: DS-798921853-DN01-50010-1328651609047, blockid:
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(DN01:50010,
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688
 to /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/DN01:50010
 remote=/DN01:43516]
 at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:03:44,455 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(DN01:50010,
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/DN01:50010
 remote=/DN01:43516]
 at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:12:11,949 INFO
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
 succeeded for blk_4602445008578088178_5707787
 2012-07-20 00:12:11,962 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
 blk_-8916344806514717841_14081066 received exception
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for
 channel to be ready for read. ch :
 java.nio.channels.SocketChannel[connected local=/DN01:36634
 remote=/DN03:50010]
 2012-07-20 00:12:11,962 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(DN01:50010,
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for
 channel to be ready for read. ch :
 java.nio.channels.SocketChannel[connected local=/DN01:36634
 remote=/DN03:50010]
 at
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
 at
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
 at
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at java.io.DataInputStream.readShort(DataInputStream.java:312)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:447)
 at
 

Re: Datanode error

2012-07-20 Thread Harsh J
Pablo,

These all seem to be timeouts from clients when they wish to read a
block and drops from clients when they try to write a block. I
wouldn't think of them as critical errors. Aside of being worried that
a DN is logging these, are you noticing any usability issue in your
cluster? If not, I'd simply blame this on stuff like speculative
tasks, region servers, general HDFS client misbehavior, etc.

Please do also share if you're seeing an issue that you think is
related to these log messages.

On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa pa...@psafe.com wrote:
 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex: 
 -XX:MaxDirectMemorySize=2G)
 and different configs (xceivers=8192) but could not solve it.

 Does anyone know what is the problem and how can I solve it? (the stacktrace 
 is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), already 
 considering the number of cpus
 % vmstat - shows no swap at all
 % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
 srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to 
 /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:03:44,455 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:50010 remote=/DN01:43516]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at 
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:12:11,949 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
 succeeded for blk_4602445008578088178_5707787
 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 writeBlock blk_-8916344806514717841_14081066 received exception 
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 2012-07-20 00:12:11,962 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
 at 
 

Re: Datanode error

2012-07-20 Thread Raj Vishwanathan
Could also be due to network issues. Number of sockets could be less or number 
of threads could be less.

Raj




 From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org 
Sent: Friday, July 20, 2012 9:06 AM
Subject: Re: Datanode error
 
Pablo,

These all seem to be timeouts from clients when they wish to read a
block and drops from clients when they try to write a block. I
wouldn't think of them as critical errors. Aside of being worried that
a DN is logging these, are you noticing any usability issue in your
cluster? If not, I'd simply blame this on stuff like speculative
tasks, region servers, general HDFS client misbehavior, etc.

Please do also share if you're seeing an issue that you think is
related to these log messages.

On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa pa...@psafe.com wrote:
 Hey guys,
 I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and 
 working.
 However my datanodes keep having the same errors, over and over.

 I googled the problems and tried different flags (ex: 
 -XX:MaxDirectMemorySize=2G)
 and different configs (xceivers=8192) but could not solve it.

 Does anyone know what is the problem and how can I solve it? (the stacktrace 
 is at the end)

 I am running:
 Java 1.7
 Hadoop 0.20.2
 Hbase 0.90.6
 Zoo 3.3.5

 % top - shows low load average (6% most of the time up to 60%), already 
 considering the number of cpus
 % vmstat - shows no swap at all
 % sar - shows 75% idle cpu in the worst case

 Hope you guys can help me.
 Thanks in advance,
 Pablo

 2012-07-20 00:03:44,455 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
 /DN01:50010, dest:
 /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
 DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
 srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
 blk_914960691839012728_14061688, duration:
 480061254006
 2012-07-20 00:03:44,455 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 
 to /DN01:
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : 
 java.nio.channels.SocketChannel[connected local=/DN01:50010 
 remote=/DN01:43516]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:03:44,455 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : 
 java.nio.channels.SocketChannel[connected local=/DN01:50010 
 remote=/DN01:43516]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
         at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
         at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
         at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

 2012-07-20 00:12:11,949 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
 succeeded for blk_4602445008578088178_5707787
 2012-07-20 00:12:11,962 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
 blk_-8916344806514717841_14081066 received exception 
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 local=/DN01:36634 remote=/DN03:50010]
 2012-07-20 00:12:11,962 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(DN01:50010, 
 storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
 channel to be ready for read. ch : java.nio.channels.SocketChannel