Stephen O'Donnell created HDDS-8505:
---------------------------------------

             Summary: ReplicationManager: Add configurable global replication 
limit
                 Key: HDDS-8505
                 URL: https://issues.apache.org/jira/browse/HDDS-8505
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: SCM
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


We should make it possible to configure a global replication limit, limiting 
the number of inflight containers pending creation. A larger cluster would be 
capable of having more inflight replication than a smaller cluster, so the 
limit should be a function of the number of datanodes on the cluster, and the 
limit of the number of commands which can be queued per datanode and some 
weighting factor.

For example, if each datanode can queue 20 replication commands, and there are 
100 nodes in the cluster, then the natural limit is 20 * 100. However, that 
assumes that commands are queued evenly across all datanodes, which is 
unlikely. With a global limit we would prefer that all datanodes are not fully 
loaded with replication commands simultaneously, so we may want to impose a 
limit of half that number, with a factor of 0.5, eg 20 * 100 * 0.5 = 1k pending 
replications.

At one extreme this would result in all datanodes in the cluster having half 
their maximum tasks queued, but in practice, some DNs are likely to be at their 
limit while others have zero or less than half queued.

If the limits were perfectly defined, such that in a single heartbeat a 
datanode can complete all its queued work just at the end of the heartbeat 
interval, then reducing the number of queued commands by half would make the 
datanode busy for only half its heartbeat interval. As the datanodes will all 
heartbeat at different times, all the busy and non-work periods across all the 
datanodes would combine in a load profile that would show some datanodes are 
always idle, reducing the overall load on the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to