[
https://issues.apache.org/jira/browse/HDDS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal updated HDDS-15238:
---------------------------------
Description:
ContainerSafeModeRule for both Ratis and EC during refresh re-initialize
containers again. But it does not remove already reported containers during DN
registeration to SCM.
So getStatusText() shows already reported containers also.
Further, on reinitalize(), it will get new containers added recently from DN
and added to ContainerManager, but there will be no further notification to
count those containers.
eg:
* {{Initally: Refreshed RATIS Containers threshold count to 6}}
* {{Later, Refreshed RATIS Containers threshold count to 7}}
* {{{}{}}}Later, {{Refreshed RATIS Containers threshold count to 8}}
So Dn which is already registered has updated only 6 containers, but later on
during the time before safemode exit, if new conainers are added and closed,
this can cause safemode never exit.
This will happen mostly at *Follower* where new container added add as result
of:
* sync from Leader node
* pending Ratis transaction updating the DB with new container (may impact
leader also)
But after refresh, newly added container will never be updated with any
notification from DN as this is send only for registeration request.
This can make safe mode rule not to exit.
was:
ContainerSafeModeRule for both Ratis and EC during refresh re-initialize
containers again. But it does not remove already reported containers during DN
registeration to SCM.
So getStatusText() shows already reported containers also.
Further, on reinitalize(), it will get new containers added recently from DN
and added to ContainerManager, but there will be no further notification to
count those containers.
This can make safe mode rule not to exit.
> ContaierSafemodeRule shows already reported containers in sample status
> -----------------------------------------------------------------------
>
> Key: HDDS-15238
> URL: https://issues.apache.org/jira/browse/HDDS-15238
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Sumit Agrawal
> Priority: Major
>
> ContainerSafeModeRule for both Ratis and EC during refresh re-initialize
> containers again. But it does not remove already reported containers during
> DN registeration to SCM.
> So getStatusText() shows already reported containers also.
>
> Further, on reinitalize(), it will get new containers added recently from DN
> and added to ContainerManager, but there will be no further notification to
> count those containers.
>
> eg:
> * {{Initally: Refreshed RATIS Containers threshold count to 6}}
> * {{Later, Refreshed RATIS Containers threshold count to 7}}
> * {{{}{}}}Later, {{Refreshed RATIS Containers threshold count to 8}}
> So Dn which is already registered has updated only 6 containers, but later on
> during the time before safemode exit, if new conainers are added and closed,
> this can cause safemode never exit.
>
> This will happen mostly at *Follower* where new container added add as result
> of:
> * sync from Leader node
> * pending Ratis transaction updating the DB with new container (may impact
> leader also)
>
> But after refresh, newly added container will never be updated with any
> notification from DN as this is send only for registeration request.
> This can make safe mode rule not to exit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]