[ https://issues.apache.org/jira/browse/HBASE-19768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433343#comment-16433343 ]
chenxu commented on HBASE-19768: -------------------------------- *FanOutOneBlockAsyncDFSOutputHelper#createOutput* if overwrite mode is true, is there any need to recover the file lease? the client can reuse it. how about modify it like this {code:java} } finally { if (!succ) { if (futureList != null) { for (Future<Channel> f : futureList) { f.addListener(new FutureListener<Channel>() { @Override public void operationComplete(Future<Channel> future) throws Exception { if (future.isSuccess()) { future.getNow().close(); } } }); } } if(!overwrite) { endFileLease(client, stat.getFileId()); fsUtils.recoverFileLease(dfs, new Path(src), conf, new CancelOnClose(client)); } } } {code} > RegionServer startup failing when DN is dead > -------------------------------------------- > > Key: HBASE-19768 > URL: https://issues.apache.org/jira/browse/HBASE-19768 > Project: HBase > Issue Type: Bug > Reporter: Jean-Marc Spaggiari > Assignee: Duo Zhang > Priority: Critical > Fix For: 2.0.0-beta-2, 2.0.0 > > Attachments: HBASE-19768.patch > > > When starting HBase, if the datanode hosted on the same host is dead but not > yet detected by the namenode, HBase will fail to start > {code} > 515691223393/node8.distparser.com%2C16020%2C1515691223393.1515691238778 > failed, retry = 7 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010 > at > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown > Source) > Caused by: > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException: > syscall:getsockopt(..) failed: Connexion refusée > ... 1 more > {code} > and will also get stuck to stop: > {code} > hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh > stopping > hbase....................................................................................................................................................................................................^C > hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh > stopping > hbase.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > {code} > The most interesting is that it seems to fail the same way even if the DN is > declared dead on HDFS side: > {code} > 515692041367/node8.distparser.com%2C16020%2C1515692041367.1515692057716 > failed, retry = 4 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010 > at > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown > Source) > Caused by: > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException: > syscall:getsockopt(..) failed: Connexion refusée > ... 1 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)