[ https://issues.apache.org/jira/browse/HBASE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Honghua updated HBASE-9501: -------------------------------- Attachment: HBASE-9501-trunk_v1.patch as this jira's description says, this feature is to prevent replication from congesting the source-peer network channel which can occur after a peer is enabled after some time disabling and it will pushes the accumulated hlog entries to peer cluster at full speed. such full-speed push without any throttling can influence other applications which share the same cluster-cluster bandwidth. some notes about the trunk patch: 1. throttling cycle is 100ms, within each cycle the push size from a single source node can't exceeds (perPeerNodeBandwidth/10), unit of perPeerNodeBandwidth is bytes per second. 2. if a single push size exceeds (perPeerNodeBandwidth/10), sleep some following cycles to amortize. 3. by default perPeerNodeBandwidth is 0, which means by default no throttling as before, and the behavior also stays the same as before if perPeerNodeBandwidth is not explicitly configured. 4. a unit-test testThrottling in TestReplicationSmallTests is added for verifying the throttling effect which sees >4-times delay difference when perPeerNodeBandwidth with 5-times difference configured. and by checking log added only for debugging it's ensured sleep in all possible throttling paths are covered by this unit-test. ping [~jdcryans] and [~ndimiduk] for review, thanks a lot :-) > No throttling for replication > ----------------------------- > > Key: HBASE-9501 > URL: https://issues.apache.org/jira/browse/HBASE-9501 > Project: HBase > Issue Type: Improvement > Components: Replication > Reporter: Feng Honghua > Assignee: Feng Honghua > Attachments: HBASE-9501-trunk_v0.patch, HBASE-9501-trunk_v1.patch > > > When we disable a peer for a time of period, and then enable it, the > ReplicationSource in master cluster will push the accumulated hlog entries > during the disabled interval to the re-enabled peer cluster at full speed. > If the bandwidth of the two clusters is shared by different applications, the > push at full speed for replication can use all the bandwidth and severely > influence other applications. > Though there are two config replication.source.size.capacity and > replication.source.nb.capacity to tweak the batch size each time a push > delivers, but if decrease these two configs, the number of pushes increase, > and all these pushes proceed continuously without pause. And no obvious help > for the bandwidth throttling. > From bandwidth-sharing and push-speed perspective, it's more reasonable to > provide a bandwidth up limit for each peer push channel, and within that > limit, peer can choose a big batch size for each push for bandwidth > efficiency. > Any opinion? -- This message was sent by Atlassian JIRA (v6.1.5#6160)