Xushaohong opened a new pull request #1942: URL: https://github.com/apache/ozone/pull/1942
## What changes were proposed in this pull request? Background: The current retry policy of DN is to retry sending with a 1s interval. Given at some time-point, all the DNs lost connection with the SCM at the same time, due to the restart of SCM, all DNs will send container report to SCM nearly at the same time, which is a ContainerReport Storm. Solution: Manually adjust the rpc-retry-interval with rpc-retry-count could mitigate extreme cases such as OOM, when facing up a huge cluster. Make the rpc-retry-interval configurable. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-4754 ## How was this patch tested? CI ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
