Ivan Andika created HDDS-15330:
----------------------------------
Summary: Implement SCM FCR rate-limit
Key: HDDS-15330
URL: https://issues.apache.org/jira/browse/HDDS-15330
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Ivan Andika
Assignee: Ivan Andika
We have previous instances where a new bootstrapped SCM becomes OOM (FYI the
OOM has 96GB heap size). We suspect that it's due to the concurrent FCR reports
processed in SCM.
HDFS implements a full block reports rate limit in HDFS-7923 to reduce the
concurrent block reports residing in SCM using BlockReportLeaseManager. Ozone
should also implement similar mechanism to prevent FCR storms.
A possible design is that we register DN first, but don't include the full FCR
immediately. SCM grants only N datanodes permission to send FCRs at once,
similar to HDFS implementation.
One tradeoff of the rate-limiting is that new SCM might delay the SafeMode
exit. However, this is better than SCM OOM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]