[ 
https://issues.apache.org/jira/browse/HDDS-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090034#comment-18090034
 ] 

Wei-Chiu Chuang commented on HDDS-15071:
----------------------------------------

  ### 1. Cluster-wide Queue and Load Limits (SCM Side)

  The Storage Container Manager (SCM) uses these parameters in 
ReplicationManager.java to avoid overloading datanodes with replication and 
reconstruction
tasks:

  •  hdds.scm.replication.datanode.replication.limit  (Default:  20 )
  Restricts the total number of replication and reconstruction commands queued 
on a single datanode.
  •  hdds.scm.replication.datanode.reconstruction.weight  (Default:  3 )
  Determines how much load a reconstruction command adds compared to standard 
replication. Since reconstruction is more resource-intensive, each
  reconstruction command is multiplied by this weight.
  •  hdds.scm.replication.inflight.limit.factor  (Default:  0.75 )
  Scales down the global replication/reconstruction tasks pending across the 
entire cluster. The global limit is calculated dynamically as:

    Global Limit = Healthy Nodes × datanode.replication.limit × 
inflight.limit.factor

> [SCM] Add configuration and global reconstruction limit
> -------------------------------------------------------
>
>                 Key: HDDS-15071
>                 URL: https://issues.apache.org/jira/browse/HDDS-15071
>             Project: Apache Ozone
>          Issue Type: Task
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>
>   Introduce the foundational configuration properties in 
> ReplicationManagerConfiguration:
>    * hdds.scm.replication.decommission.ec.reconstruction.enabled
>    * hdds.scm.replication.decommission.ec.reconstruction.load.factor (default 
> 0.9)
>    * hdds.scm.replication.reconstruction.global.limit
>   Implement an atomic counter in ReplicationManager to track active 
> ReconstructECContainersCommand tasks cluster-wide. Update 
> UnhealthyReplicationProcessor
>   to check this global limit before dequeuing containers for reconstruction, 
> ensuring aggregate bisectional bandwidth is protected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to