[ 
https://issues.apache.org/jira/browse/HBASE-19768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433343#comment-16433343
 ] 

chenxu commented on HBASE-19768:
--------------------------------

*FanOutOneBlockAsyncDFSOutputHelper#createOutput*
if overwrite mode is true, is there any need to recover the file lease? the 
client can reuse it.
how about modify it like this

{code:java}
} finally {
  if (!succ) {
    if (futureList != null) {
      for (Future<Channel> f : futureList) {
        f.addListener(new FutureListener<Channel>() {
          @Override
          public void operationComplete(Future<Channel> future) throws 
Exception {
            if (future.isSuccess()) {
              future.getNow().close();
            }
          }
        });
      }
    }
    if(!overwrite) {
      endFileLease(client, stat.getFileId());
      fsUtils.recoverFileLease(dfs, new Path(src), conf, new 
CancelOnClose(client));
    }
  }
}
{code}


> RegionServer startup failing when DN is dead
> --------------------------------------------
>
>                 Key: HBASE-19768
>                 URL: https://issues.apache.org/jira/browse/HBASE-19768
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 2.0.0-beta-2, 2.0.0
>
>         Attachments: HBASE-19768.patch
>
>
> When starting HBase, if the datanode hosted on the same host is dead but not 
> yet detected by the namenode, HBase will fail to start
> {code}
> 515691223393/node8.distparser.com%2C16020%2C1515691223393.1515691238778 
> failed, retry = 7
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
>  Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée
>       ... 1 more
> {code}
> and will also get stuck to stop:
> {code}
> hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
> stopping 
> hbase....................................................................................................................................................................................................^C
> hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
> stopping 
> hbase..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}
> The most interesting is that it seems to fail the same way even if the DN is 
> declared dead on HDFS side:
> {code}
> 515692041367/node8.distparser.com%2C16020%2C1515692041367.1515692057716 
> failed, retry = 4
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
>  Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
>  syscall:getsockopt(..) failed: Connexion refusée
>       ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to