[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

Nick Dimiduk (JIRA) Mon, 09 Sep 2013 17:18:09 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762507#comment-13762507
 ]


Nick Dimiduk commented on HBASE-7634:
-------------------------------------

Is there a corresponding patch to the book and/or package summary documenting 
the additional ZK watches and configuration points this patch introduces? How 
about a release note (at least for config)?
                
> Replication handling of changes to peer clusters is inefficient
> ---------------------------------------------------------------
>
>                 Key: HBASE-7634
>                 URL: https://issues.apache.org/jira/browse/HBASE-7634
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.95.2
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.98.0, 0.95.2
>
>         Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
> HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
> HBASE-7634.v6.patch
>
>
> The current handling of changes to the region servers in a replication peer 
> cluster is currently quite inefficient. The list of region servers that are 
> being replicated to is only updated if there are a large number of issues 
> encountered while replicating.
> This can cause it to take quite a while to recognize that a number of the 
> regionserver in a peer cluster are no longer available. A potentially bigger 
> problem is that if a replication peer cluster is started with a small number 
> of regionservers, and then more region servers are added after replication 
> has started, the additional region servers will never be used for replication 
> (unless there are failures on the in-use regionservers).
> Part of the current issue is that the retry code in 
> ReplicationSource#shipEdits checks a randomly-chosen replication peer 
> regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
> replication write has failed on a different randonly-chosen replication peer. 
> If the peer is seen as not down, another randomly-chosen peer is used for 
> writing.
> A second part of the issue is that changes to the list of region servers in a 
> peer cluster are not detected at all, and are only picked up if a certain 
> number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

Reply via email to