[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-06-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Fix Version/s: 1.3.2

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-30 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-27 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027-branch-1.patch
HBASE-18027.patch

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-27 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Fix Version/s: (was: 1.3.2)
   Status: Patch Available  (was: Open)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0, 1.4.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-26 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027-branch-1.patch
HBASE-18027.patch

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027-branch-1.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-26 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027-branch-1.patch
HBASE-18027.patch

Attaching union of earlier patches and Ashu's suggestion for master and 
branch-1. I'll come back and set this Patch Available if local tests check out.

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027-branch-1.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027-branch-1.patch

The branch-1 patch is simpler. I keep the logging changes and only the clamping 
of replication queue capacity limit.

I did look into changing logic in ReplicationSource to check if we go overlimit 
_before_ adding an entry to the replication worklist, but the question then is 
what to do with that last read entry. Must be robust to failure. In-memory only 
'put back' queue will lose data upon failure. Seeking the reader back to before 
the most recent read did not test out to be robust with active chaos.

Instead we try to avoid creating an overlarge RPC by setting the replication 
queue capacity limit to the lesser of replication.source.size.capacity or 95% 
of the RPC size limit. This is trying a lot harder than we did previously to 
avoid the problem.

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027-branch-1.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Status: Open  (was: Patch Available)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0, 1.4.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-23 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Status: Open  (was: Patch Available)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0, 1.4.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-23 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Status: Patch Available  (was: Open)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0, 1.4.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-22 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027.patch

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-22 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027.patch

Updated patch fixes findbugs warning. 

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, 
> HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-15 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Status: Patch Available  (was: Open)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1, 2.0.0, 1.4.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-15 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Attachment: HBASE-18027.patch

Ok, here is a patch that moves the discussed concerns up into the caller(s). 
This is in response to feedback from [~lhofhansl]. Although inside 
HBaseInterClusterReplicationEndpoint the batch of edits may be broken up into 
smaller batches for parallelization, up in the callers we can't see how the 
sub-batches might be partitioned. So, we have to consider the RPC request limit 
with respect to the whole batch, which does simplify the change: If the 
replication batch capacity will exceed the RPC request limit we should simply 
use the RPC request limit as the replication batch capacity. The downside is 
this is more pessimistic than if we check limits right where we are building 
the sub-batches and know the size of each.

Changes:


- Where we set replicationBatchSizeCapacity in 
ReplicationSourceWALReaderThread, check if replicationBatchSizeCapacity will 
exceed the RPC request size limit, and if so use the request size limit instead 
and warn about it with mention of relevant configuration keys.

- When building the current replication batch, check if we will exceed the 
batch size capacity _before_ adding an entry to the batch. Ensure at least one 
entry is added to the batch even if it means we violate quota or batch size 
capacity or otherwise replication would become stuck. 

- WALEntryStream needs a putBack() method to undo next() when we've decided the 
entry we just iterated to will cause the batch to exceed size limits.


and minor changes to logging:


- New debug logging in HBaseInterClusterReplicationEndpoint. One new DEBUG 
level line per replication batch. 

- Fix trace level log that conflated total amount of data in replication 
context with size of data in the worklist submitted as Replicator runnable. 

- Add details to the warning logged if the number of edits replicated is 
different from the number received.


All *Replication* unit tests pass with these changes applied.


> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18027) Replication should respect RPC size limits when batching edits

2017-05-15 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18027:
---
Summary: Replication should respect RPC size limits when batching edits  
(was: HBaseInterClusterReplicationEndpoint should respect RPC size limits when 
batching edits)

> Replication should respect RPC size limits when batching edits
> --
>
> Key: HBASE-18027
> URL: https://issues.apache.org/jira/browse/HBASE-18027
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 1.3.1
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-18027.patch, HBASE-18027.patch
>
>
> In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
> batches. We create N lists. N is the minimum of configured replicator 
> threads, number of 100-waledit batches, or number of current sinks. Every 
> pending entry in the replication context is then placed in order by hash of 
> encoded region name into one of these N lists. Each of the N lists is then 
> sent all at once in one replication RPC. We do not test if the sum of data in 
> each N list will exceed RPC size limits. This code presumes each individual 
> edit is reasonably small. Not checking for aggregate size while assembling 
> the lists into RPCs is an oversight and can lead to replication failure when 
> that assumption is violated.
> We can fix this by generating as many replication RPC calls as we need to 
> drain a list, keeping each RPC under limit, instead of assuming the whole 
> list will fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)