[jira] [Updated] (HDDS-15330) Implement SCM FCR rate limit

Ivan Andika (Jira) Wed, 20 May 2026 20:06:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Andika updated HDDS-15330:
-------------------------------
    Attachment: scm_Top_Components_oom.zip
                scm_System_Overview_oom.zip
                scm_Leak_Suspects_oom.zip

> Implement SCM FCR rate limit
> ----------------------------
>
>                 Key: HDDS-15330
>                 URL: https://issues.apache.org/jira/browse/HDDS-15330
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: scm_Leak_Suspects_oom.zip, scm_System_Overview_oom.zip, 
> scm_Top_Components_oom.zip
>
>
> We have previous instances where a new bootstrapped SCM becomes OOM (FYI the 
> OOM has 96GB heap size). We suspect that it's due to the concurrent FCR 
> reports processed in SCM. 
> HDFS implements a full block reports rate limit in HDFS-7923 to reduce the 
> concurrent block reports residing in SCM using BlockReportLeaseManager. Ozone 
> should also implement similar mechanism to prevent FCR storms.
> A possible design is that we register DN first, but don't include the full 
> FCR immediately. SCM grants only N datanodes permission to send FCRs at once, 
> similar to HDFS implementation.
> Another possibility to reduce the single FCR size to to split the FCR to one 
> FCR per volume (can be considered in the future). 
> One tradeoff of the rate-limiting is that new SCM might delay the SafeMode 
> exit. However, this is better than SCM OOM. Another tradeoff is that FCR 
> might be delayed for large cluster (we need to think about this).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-15330) Implement SCM FCR rate limit

Reply via email to