[jira] [Comment Edited] (HBASE-18027) HBaseInterClusterReplicationEndpoint should respect RPC size limits when batching edits

Andrew Purtell (JIRA) Sat, 13 May 2017 08:52:35 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009381#comment-16009381
 ]


Andrew Purtell edited comment on HBASE-18027 at 5/13/17 3:51 PM:
-----------------------------------------------------------------

[~lhofhansl] The worklist handed to HICRE#Replicator can exceed the RPC limit 
so we break it into separate RPCs if we need to. I think that's the best place 
to do it, since it is the code that is directly involved with creating the 
RPCs. The replication batch limit and the RPC size limits can be set in a way 
that accidentally conflict, so we need to do this check at the last step. The 
code in ReplicationSourceWorkerThread.readAllEntriesToReplicateOrNextFile can 
generate an overly large worklist because the size check occurs after the 
current entry (which can put it over limit) is added to the list. This can be 
changed but I still contend the best place to check where we are exceeding RPC 
limits is where we are generating the RPCs. 


was (Author: apurtell):
[~lhofhansl] The worklist handed to HICRE#Replicator can exceed the RPC limit 
so we break it into separate RPCs if we need to. I think that's the best place 
to do it, since it is the code that is directly involved with creating the 
RPCs. The replication batch limit and the RPC size limits can be set in a way 
that accidentally conflict, so we need to do this check at the last step. 

> HBaseInterClusterReplicationEndpoint should respect RPC size limits when 
> batching edits
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-18027
>                 URL: https://issues.apache.org/jira/browse/HBASE-18027
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.4.0, 1.3.1
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 1.4.0, 1.3.2
>
>         Attachments: HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HBASE-18027) HBaseInterClusterReplicationEndpoint should respect RPC size limits when batching edits

Reply via email to