[ 
https://issues.apache.org/jira/browse/HBASE-19768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433377#comment-16433377
 ] 

Duo Zhang commented on HBASE-19768:
-----------------------------------

OK I know what is the problem now. In general we need to call recover lease to 
close the file. I'm not sure whether HDFS allows overwriting a file which is 
being written. If it can, then maybe we can bypass the recover lease, but the 
endFileLease must be called otherwise the file will be opened for ever unless 
we restart the RS.

> RegionServer startup failing when DN is dead
> --------------------------------------------
>
>                 Key: HBASE-19768
>                 URL: https://issues.apache.org/jira/browse/HBASE-19768
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 2.0.0-beta-2, 2.0.0
>
>         Attachments: HBASE-19768.patch
>
>
> When starting HBase, if the datanode hosted on the same host is dead but not 
> yet detected by the namenode, HBase will fail to start
> {code}
> 515691223393/node8.distparser.com%2C16020%2C1515691223393.1515691238778 
> failed, retry = 7
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
>  Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée
>       ... 1 more
> {code}
> and will also get stuck to stop:
> {code}
> hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
> stopping 
> hbase....................................................................................................................................................................................................^C
> hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
> stopping 
> hbase..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}
> The most interesting is that it seems to fail the same way even if the DN is 
> declared dead on HDFS side:
> {code}
> 515692041367/node8.distparser.com%2C16020%2C1515692041367.1515692057716 
> failed, retry = 4
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
>  Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée
>       ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to