[jira] [Created] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

Jean-Daniel Cryans (JIRA) Thu, 19 Sep 2013 17:51:07 -0700

Jean-Daniel Cryans created HBASE-9591:
-----------------------------------------


             Summary: [replication] getting "Current list of sinks is out of 
date" all the time when a source is recovered
                 Key: HBASE-9591
                 URL: https://issues.apache.org/jira/browse/HBASE-9591
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.96.0
            Reporter: Jean-Daniel Cryans
            Priority: Minor
             Fix For: 0.96.1


I tried killing a region server when the slave cluster was down, from that 
point on my log was filled with:

{noformat}
2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
Current list of sinks is out of date, updating
2013-09-20 00:31:04,226 INFO  
[ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
Current list of sinks is out of date, updating
{noformat}

The first log line is from the normal source, the second is the recovered one. 
When we try to replicate, we call replicationSinkMgr.getReplicationSink() and 
if the list of machines was refreshed since the last time then we call 
chooseSinks() which in turn refreshes the list of sinks and resets our 
lastUpdateToPeers. The next source will notice the change, and will call 
chooseSinks() too. The first source is coming for another round, sees the list 
was refreshed, calls chooseSinks() again. It happens forever until the 
recovered queue is gone.

We could have all the sources going to the same cluster share a thread-safe 
ReplicationSinkManager. We could also manage the same cluster separately for 
each source. Or even easier, if the list we get in chooseSinks() is the same we 
had before, consider it a noop.

What do you think [~gabriel.reid]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

Reply via email to