[ https://issues.apache.org/jira/browse/HBASE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540631#comment-13540631 ]
Lars Hofhansl commented on HBASE-7293: -------------------------------------- Can anybody else have a quick look. Otherwise I'll move it to 0.94.5. > [replication] Remove dead sinks from ReplicationSource.currentPeers, it's > spammy > -------------------------------------------------------------------------------- > > Key: HBASE-7293 > URL: https://issues.apache.org/jira/browse/HBASE-7293 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3, 0.96.0 > Reporter: Jean-Daniel Cryans > Assignee: Lars Hofhansl > Fix For: 0.96.0, 0.94.4 > > Attachments: 7293-0.94.txt, 7293-0.94-v2.txt, 7293-0.96.txt > > > I happened to look at a log today where I saw a lot lines like this: > {noformat} > 2012-12-06 23:29:08,318 INFO > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave > cluster looks down: This server is in the failed servers list: > sv4r20s49/10.4.20.49:10304 > 2012-12-06 23:29:15,987 WARN > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't > replicate because of a local or network error: > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:519) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:484) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) > at $Proxy14.replicateLogEntries(Unknown Source) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:627) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) > 2012-12-06 23:29:15,988 INFO > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave > cluster looks down: Connection refused > {noformat} > What struck me as weird is this had been going on for some days, I would > expect the RS to find new servers if it wasn't able to replicate. But the > reality is that only a few of the chosen sink RS were down so eventually the > source hits one that's good and is never able to refresh its list of servers. > We should remove the dead servers, it's spammy and probably adds some slave > lag. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira