Re: write to most datanode fail quickly

2014-10-14 Thread Ted Yu
Which Hadoop release are you using ?

Have you run fsck ?

Cheers

On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:

 Hi
 I'm using hbase with about 20 regionserver. And  one regionserver failed 
 to write  most of datanodes quickly, finally cause this regionserver die. 
 While other regionserver is ok. 
 
 logs like this:
 
 java.io.IOException: Bad response ERROR for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 
 from datanode 132.228.248.20:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in 
 pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: 
 bad datanode 132.228.248.20:50010
 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
 java.io.IOException: Bad response ERROR for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 
 from datanode 132.228.248.41:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
 
 
 
 then serveral  firstBadLink error 
 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception 
 in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
 
 
 then serveral Failed to add a datanode
 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error 
 while syncing
 java.io.IOException: Failed to add a datanode.  User may turn off this 
 feature by setting dfs.client.block.write.replace-datanode-on-failure.policy 
 in configuration, where the current policy is DEFAULT.  (Nodes: 
 current=[132.228.248.17:50010, 132.228.248.35:50010], 
 original=[132.228.248.17:50010, 132.228.248.35:50010])
 
 the full log is in http://paste2.org/xfn16jm2
 
 Any suggestion will be appreciated. Thanks.


RE: write to most datanode fail quickly

2014-10-14 Thread sunww

I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these 
dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhih...@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:




HiI'm using hbase with about 20 regionserver. And  one regionserver failed 
to write  most of datanodes quickly, finally cause this regionserver die. While 
other regionserver is ok. 
logs like this:java.io.IOException: Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from 
datanode 132.228.248.20:50010  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13
 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in 
pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad 
datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN 
org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  
for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException:
 Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from 
datanode 132.228.248.41:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
then serveral  firstBadLink error 2014-10-13 09:23:33,390 
INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink 
as 132.228.248.18:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
then serveral Failed to add a datanode2014-10-13 09:23:44,331 
WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: 
Failed to add a datanode.  User may turn off this feature by setting 
dfs.client.block.write.replace-datanode-on-failure.policy in configuration, 
where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 
132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
the full log is in http://paste2.org/xfn16jm2Any suggestion will be 
appreciated. Thanks.  
  

Re: write to most datanode fail quickly

2014-10-14 Thread Ted Yu
Can you check NameNode log for 132.228.48.20 ?

Have you turned on short circuit read ?

Cheers

On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote:

 
 I'm using Hadoop 2.0.0 and  not  run fsck.  
 only one regionserver have these dfs logs,   strange.
 
 Thanks
 CC: user@hadoop.apache.org
 From: yuzhih...@gmail.com
 Subject: Re: write to most datanode fail quickly
 Date: Tue, 14 Oct 2014 02:43:26 -0700
 To: user@hadoop.apache.org
 
 Which Hadoop release are you using ?
 
 Have you run fsck ?
 
 Cheers
 
 On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:
 
 Hi
 I'm using hbase with about 20 regionserver. And  one regionserver failed 
 to write  most of datanodes quickly, finally cause this regionserver die. 
 While other regionserver is ok. 
 
 logs like this:
 
 java.io.IOException: Bad response ERROR for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 
 from datanode 132.228.248.20:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in 
 pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: 
 bad datanode 132.228.248.20:50010
 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: 
 DFSOutputStream ResponseProcessor exception  for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
 java.io.IOException: Bad response ERROR for block 
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 
 from datanode 132.228.248.41:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
 
 
 
 then serveral  firstBadLink error 
 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception 
 in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
 
 
 then serveral Failed to add a datanode
 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error 
 while syncing
 java.io.IOException: Failed to add a datanode.  User may turn off this 
 feature by setting dfs.client.block.write.replace-datanode-on-failure.policy 
 in configuration, where the current policy is DEFAULT.  (Nodes: 
 current=[132.228.248.17:50010, 132.228.248.35:50010], 
 original=[132.228.248.17:50010, 132.228.248.35:50010])
 
 the full log is in http://paste2.org/xfn16jm2
 
 Any suggestion will be appreciated. Thanks.


RE: write to most datanode fail quickly

2014-10-14 Thread sunww
Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhih...@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these 
dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhih...@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:




HiI'm using hbase with about 20 regionserver. And  one regionserver failed 
to write  most of datanodes quickly, finally cause this regionserver die. While 
other regionserver is ok. 
logs like this:java.io.IOException: Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from 
datanode 132.228.248.20:50010  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13
 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in 
pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad 
datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN 
org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  
for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException:
 Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from 
datanode 132.228.248.41:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
then serveral  firstBadLink error 2014-10-13 09:23:33,390 
INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink 
as 132.228.248.18:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
then serveral Failed to add a datanode2014-10-13 09:23:44,331 
WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: 
Failed to add a datanode.  User may turn off this feature by setting 
dfs.client.block.write.replace-datanode-on-failure.policy in configuration, 
where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 
132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
the full log is in http://paste2.org/xfn16jm2Any suggestion will be 
appreciated. Thanks.  
  
  

Re: write to most datanode fail quickly

2014-10-14 Thread Ted Yu
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww spe...@outlook.com wrote:

 Hi

 dfs.client.read.shortcircuit is true.

 this is namenode log at that moment:
 http://paste2.org/U0zDA9ms

 It seems like there is no special in namenode log.

 Thanks
 --
 CC: user@hadoop.apache.org
 From: yuzhih...@gmail.com
 Subject: Re: write to most datanode fail quickly
 Date: Tue, 14 Oct 2014 03:09:24 -0700

 To: user@hadoop.apache.org

 Can you check NameNode log for 132.228.48.20 ?

 Have you turned on short circuit read ?

 Cheers

 On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote:


 I'm using Hadoop 2.0.0 and  not  run fsck.
 only one regionserver have these dfs logs,   strange.

 Thanks
 --
 CC: user@hadoop.apache.org
 From: yuzhih...@gmail.com
 Subject: Re: write to most datanode fail quickly
 Date: Tue, 14 Oct 2014 02:43:26 -0700
 To: user@hadoop.apache.org

 Which Hadoop release are you using ?

 Have you run fsck ?

 Cheers

 On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:

 Hi
 I'm using hbase with about 20 regionserver. And  one regionserver
 failed to write  most of datanodes quickly, finally cause this
 regionserver die. While other regionserver is ok.

 logs like this:

 java.io.IOException: Bad response ERROR for block
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
 from datanode 132.228.248.20:50010
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
 in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
 132.228.248.41:50010: bad datanode 132.228.248.20:50010
 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
 java.io.IOException: Bad response ERROR for block
 BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
 from datanode 132.228.248.41:50010
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)



 then serveral  firstBadLink error 
 2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
 Exception in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink as
 132.228.248.18:50010
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)


 then serveral Failed to add a datanode
 2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
 while syncing
 java.io.IOException: Failed to add a datanode.  User may turn off this
 feature by setting
 dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
 where the current policy is DEFAULT.  (Nodes: current=[
 132.228.248.17:50010, 132.228.248.35:50010], original=[
 132.228.248.17:50010, 132.228.248.35:50010])

 the full log is in http://paste2.org/xfn16jm2

 Any suggestion will be appreciated. Thanks.




RE: write to most datanode fail quickly

2014-10-14 Thread sunww
Hithe correct  ip is  132.228.248.20.I check  hdfs log in  the dead 
regionserver, it have some error message, maybe it's useful.
http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhih...@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you 
posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww spe...@outlook.com wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhih...@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww spe...@outlook.com wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these 
dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhih...@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww spe...@outlook.com wrote:




HiI'm using hbase with about 20 regionserver. And  one regionserver failed 
to write  most of datanodes quickly, finally cause this regionserver die. While 
other regionserver is ok. 
logs like this:java.io.IOException: Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from 
datanode 132.228.248.20:50010  at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13
 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in 
pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad 
datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN 
org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  
for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException:
 Bad response ERROR for block 
BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from 
datanode 132.228.248.41:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
then serveral  firstBadLink error 2014-10-13 09:23:33,390 
INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink 
as 132.228.248.18:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
then serveral Failed to add a datanode2014-10-13 09:23:44,331 
WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: 
Failed to add a datanode.  User may turn off this feature by setting 
dfs.client.block.write.replace-datanode-on-failure.policy in configuration, 
where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 
132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
the full log is in http://paste2.org/xfn16jm2Any suggestion will be 
appreciated. Thanks.