[ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009409#comment-16009409
 ] 

Andrew Purtell commented on HBASE-18027:
----------------------------------------

Let me add some debug level logging where we detect the worklist exceeds RPC 
size limits in the proposed changes.  That's a good point. It will indicate if 
the various configurations are in conflict and aid debugging if the caller's 
size estimation is broken or there is some other factor causing unexpectedly 
large worklists to be handed to Replicator.

> HBaseInterClusterReplicationEndpoint should respect RPC size limits when 
> batching edits
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-18027
>                 URL: https://issues.apache.org/jira/browse/HBASE-18027
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.4.0, 1.3.1
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 1.4.0, 1.3.2
>
>         Attachments: HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to