[ 
https://issues.apache.org/jira/browse/HBASE-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450306#comment-13450306
 ] 

stack commented on HBASE-3814:
------------------------------

I think the basic idea of a kill switch if the RS is stuck going down is a good 
one.  Lets open new issue if we see this happen again (Even if the scenario as 
the above described one seems to be, it seems like a good safety mechanism to 
have).
                
> force regionserver to halt
> --------------------------
>
>                 Key: HBASE-3814
>                 URL: https://issues.apache.org/jira/browse/HBASE-3814
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Prakash Khemani
>
> Once abort() on a regionserver is called we should have a timeout thread that 
> does Runtime.halt() if the rs gets stuck somewhere during abort processing.
> ===
> Pumahbase132 has following the logs .. the dfsclient is not able to set up a 
> write pipeline successfully ... it tries to abort ... but while aborting it 
> gets stuck. I know there is a check that if we are aborting because 
> filesystem is closed then we should not try to flush the logs while aborting. 
> But in this case the fs is up and running, just that it is not functioning.
> 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream 10.38.131.53:50010  for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
>  Bad connect ack with firstBadLink 10.38.133.33:50010
> 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_-8967376451767492285_6537229 for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
> 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream 10.38.131.53:50010  for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
>  Bad connect ack with firstBadLink 10.38.134.59:50010
> 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_7172251852699100447_6537229 for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
>  
> 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream 10.38.131.53:50010  for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
>  Bad connect ack with firstBadLink 10.38.134.53:50010
> 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_-9153204772467623625_6537229 for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
> 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream 10.38.131.53:50010  for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
>  Bad connect ack with firstBadLink 10.38.134.49:50010
> 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_-2513098940934276625_6537229 for file 
> /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
> 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Exception: java.io.IOException: Unable to create new block.
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977)
> 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
> for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null
> 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
> block locations. Source file 
> "/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280"
>  - Aborting...
> 2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Could not append. Requesting close of hlog
> And then the RS gets stuck trying to roll the logs ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to