[ 
https://issues.apache.org/jira/browse/HDDS-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-15330:
-------------------------------
    Summary: Implement SCM FCR rate limit  (was: Implement SCM FCR rate-limit)

> Implement SCM FCR rate limit
> ----------------------------
>
>                 Key: HDDS-15330
>                 URL: https://issues.apache.org/jira/browse/HDDS-15330
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> We have previous instances where a new bootstrapped SCM becomes OOM (FYI the 
> OOM has 96GB heap size). We suspect that it's due to the concurrent FCR 
> reports processed in SCM. 
> HDFS implements a full block reports rate limit in HDFS-7923 to reduce the 
> concurrent block reports residing in SCM using BlockReportLeaseManager. Ozone 
> should also implement similar mechanism to prevent FCR storms.
> A possible design is that we register DN first, but don't include the full 
> FCR immediately. SCM grants only N datanodes permission to send FCRs at once, 
> similar to HDFS implementation.
> Another possibility to reduce the single FCR size to to split the FCR to one 
> FCR per volume (can be considered in the future). 
> One tradeoff of the rate-limiting is that new SCM might delay the SafeMode 
> exit. However, this is better than SCM OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to