[ 
https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-17368:
-------------------------------
    Component/s: ha
                 namenode

> HA: Standy should exit safemode when resources are from low available
> ---------------------------------------------------------------------
>
>                 Key: HDFS-17368
>                 URL: https://issues.apache.org/jira/browse/HDFS-17368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>            Reporter: Zilong Zhu
>            Assignee: Zilong Zhu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> The NameNodeResourceMonitor automatically enters safemode when it detects 
> that the resources are not suffcient. NNRM is only in ANN. If both ANN and 
> SNN enter SM due to low resources, and later SNN's disk space is restored, 
> SNN willl become ANN and ANN will become SNN. However, at this point, SNN 
> will not exit the SM, even if the disk is recovered.
> Consider the following scenario:
>  * Initially, nn-1 is active and nn-2 is standby. The insufficient resources 
> of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
> detects the resource issue and puts nn01 into safemode.
>  * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
> safemode (OFF) and standby.
>  * After a period of time, the resources in nn-2's dfs.namenode.name.dir 
> recover, triggering failover.
>  * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode 
> (OFF) and active.
>  * Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
>  * However, since nn-1 is standby but in safemode (ON), it unable to exit 
> safe mode automatically.
> There are two possible ways fix this issues:
>  # If SNN is detected to be in SM(because low resource), it will exit.
>  # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back 
> to SNN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to