[ 
https://issues.apache.org/jira/browse/HDDS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Agrawal updated HDDS-15238:
---------------------------------
    Description: 
ContainerSafeModeRule for both Ratis and EC during refresh re-initialize 
containers again. But it does not remove already reported containers during DN 
registeration to SCM.
So getStatusText() shows already reported containers also.
 
Further, on reinitalize(), it will get new containers added recently from DN 
and added to ContainerManager, but there will be no further notification to 
count those containers.
 

eg:
 * {{Initally: Refreshed RATIS Containers threshold count to 6}}

 * {{Later, Refreshed RATIS Containers threshold count to 7}}

 * {{{}{}}}Later, {{Refreshed RATIS Containers threshold count to 8}}

So Dn which is already registered has updated only 6 containers, but later on 
during the time before safemode exit, if new conainers are added and closed, 
this can cause safemode never exit.

 

This will happen mostly at *Follower* where new container added add as result 
of:
 * sync from Leader node
 * pending Ratis transaction updating the DB with new container (may impact 
leader also)

 

But after refresh, newly added container will never be updated with any 
notification from DN as this is send only for registeration request.


This can make safe mode rule not to exit.

  was:
ContainerSafeModeRule for both Ratis and EC during refresh re-initialize 
containers again. But it does not remove already reported containers during DN 
registeration to SCM.
So getStatusText() shows already reported containers also.
 
Further, on reinitalize(), it will get new containers added recently from DN 
and added to ContainerManager, but there will be no further notification to 
count those containers.
 
This can make safe mode rule not to exit.


> ContaierSafemodeRule shows already reported containers in sample status
> -----------------------------------------------------------------------
>
>                 Key: HDDS-15238
>                 URL: https://issues.apache.org/jira/browse/HDDS-15238
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Sumit Agrawal
>            Priority: Major
>
> ContainerSafeModeRule for both Ratis and EC during refresh re-initialize 
> containers again. But it does not remove already reported containers during 
> DN registeration to SCM.
> So getStatusText() shows already reported containers also.
>  
> Further, on reinitalize(), it will get new containers added recently from DN 
> and added to ContainerManager, but there will be no further notification to 
> count those containers.
>  
> eg:
>  * {{Initally: Refreshed RATIS Containers threshold count to 6}}
>  * {{Later, Refreshed RATIS Containers threshold count to 7}}
>  * {{{}{}}}Later, {{Refreshed RATIS Containers threshold count to 8}}
> So Dn which is already registered has updated only 6 containers, but later on 
> during the time before safemode exit, if new conainers are added and closed, 
> this can cause safemode never exit.
>  
> This will happen mostly at *Follower* where new container added add as result 
> of:
>  * sync from Leader node
>  * pending Ratis transaction updating the DB with new container (may impact 
> leader also)
>  
> But after refresh, newly added container will never be updated with any 
> notification from DN as this is send only for registeration request.
> This can make safe mode rule not to exit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to