[ 
https://issues.apache.org/jira/browse/HBASE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-9501:
--------------------------------

    Attachment: HBASE-9501-trunk_v1.patch

as this jira's description says, this feature is to prevent replication from 
congesting the source-peer network channel which can occur after a peer is 
enabled after some time disabling and it will pushes the accumulated hlog 
entries to peer cluster at full speed. such full-speed push without any 
throttling can influence other applications which share the same 
cluster-cluster bandwidth.

some notes about the trunk patch:
1. throttling cycle is 100ms, within each cycle the push size from a single 
source node can't exceeds (perPeerNodeBandwidth/10), unit of 
perPeerNodeBandwidth is bytes per second.
2. if a single push size exceeds (perPeerNodeBandwidth/10), sleep some 
following cycles to amortize.
3. by default perPeerNodeBandwidth is 0, which means by default no throttling 
as before, and the behavior also stays the same as before if 
perPeerNodeBandwidth is not explicitly configured.
4. a unit-test testThrottling in TestReplicationSmallTests is added for 
verifying the throttling effect which sees >4-times delay difference when 
perPeerNodeBandwidth with 5-times difference configured. and by checking log 
added only for debugging it's ensured sleep in all possible throttling paths 
are covered by this unit-test.

ping [~jdcryans] and [~ndimiduk] for review, thanks a lot :-)


> No throttling for replication
> -----------------------------
>
>                 Key: HBASE-9501
>                 URL: https://issues.apache.org/jira/browse/HBASE-9501
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: HBASE-9501-trunk_v0.patch, HBASE-9501-trunk_v1.patch
>
>
> When we disable a peer for a time of period, and then enable it, the 
> ReplicationSource in master cluster will push the accumulated hlog entries 
> during the disabled interval to the re-enabled peer cluster at full speed.
> If the bandwidth of the two clusters is shared by different applications, the 
> push at full speed for replication can use all the bandwidth and severely 
> influence other applications.
> Though there are two config replication.source.size.capacity and 
> replication.source.nb.capacity to tweak the batch size each time a push 
> delivers, but if decrease these two configs, the number of pushes increase, 
> and all these pushes proceed continuously without pause. And no obvious help 
> for the bandwidth throttling.
> From bandwidth-sharing and push-speed perspective, it's more reasonable to 
> provide a bandwidth up limit for each peer push channel, and within that 
> limit, peer can choose a big batch size for each push for bandwidth 
> efficiency.
> Any opinion?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to