Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-11 Thread via GitHub


zhuzilong2013 commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1519875686


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   Thanks for your review. I passed the failed unit test locally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-11 Thread via GitHub


Hexiaoqiao commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1988370510

   cc @zhangshuyan0 any more suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-11 Thread via GitHub


Hexiaoqiao commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1519663672


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   Great. Got it. Please check if the failed unit test is related with this 
changes. Others look good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-11 Thread via GitHub


zhuzilong2013 commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1519420043


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   Consider the following scenario:
   
   Initially, nn-1 is active and nn-2 is standby. The insufficient resources of 
both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
detects the resource issue and puts nn01 into safemode.
   At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
safemode (OFF) and standby.
   After a period of time, the resources in nn-2's dfs.namenode.name.dir 
recover, triggering failover.
   Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode (OFF) 
and active.
   Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
   However, since nn-1 is standby but in safemode (ON), it unable to exit safe 
mode automatically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-09 Thread via GitHub


zhuzilong2013 commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1518633436


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   The mention in "HDFS-2915" is that SNN should not enter resource low safe 
mode. NNRM thread removed from SNN.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-09 Thread via GitHub


zhuzilong2013 commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1518555454


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   I reused the logic from 
[HDFS-17231](https://issues.apache.org/jira/browse/HDFS-17231), and I believe 
there is no issue. HDFS-17231 enables the ANN to automatically exit 
ResourceLowSafeMode. 
   At the same time, I noticed that the 'leaveSafeMode(false)' method also 
exits 'StartupSafeMode'. I'm not sure if this is an issue; I mentioned this 
phenomenon in [HDFS-17402](https://issues.apache.org/jira/browse/HDFS-17402).
   If necessary, I can fix it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-03-07 Thread via GitHub


Hexiaoqiao commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1516122704


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   It is reasonable at first glance, not think carefully, any cases to trigger 
Standby leave safemode untimely? Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-02 Thread via GitHub


zhuzilong2013 commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1925068452

   The failed unit test seems unrelated to the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-02 Thread via GitHub


hadoop-yetus commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1924743160

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 202m 25s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 288m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6518 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 9844b3816c24 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b80669bb3b86fe940014795556044567f967d69f |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/3/testReport/ |
   | Max. process+thread count | 5006 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git

Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-02 Thread via GitHub


zhuzilong2013 commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1924146069

   @Hexiaoqiao @tasanuma @aajisaka   Hi~ sir. Could you please help me review 
this PR when you are free?  Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-02 Thread via GitHub


hadoop-yetus commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1924118883

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   6m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  32m 50s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/2/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 200m 56s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 294m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6518 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 256047b7ad38 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4126317a2fb5da07044ea99788e3d73ce30eecb2 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/2/testReport/ |
   | Max. process+thread count | 4362 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.

Re: [PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-01 Thread via GitHub


hadoop-yetus commented on PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#issuecomment-1921949946

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 199m 36s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 286m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6518 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a7b554722c26 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 13bccda7c70e086d64ce3fc3dab56325c0cbdefa |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/1/testReport/ |
   | Max. process+thread count | 4188 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6518/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https

[PR] HDFS-17368. HA: Standby should exit safemode when resources are from low available [hadoop]

2024-02-01 Thread via GitHub


zhuzilong2013 opened a new pull request, #6518:
URL: https://github.com/apache/hadoop/pull/6518

   ### Description of PR
   Refer to HDFS-17368. 
   
   The NameNodeResourceMonitor automatically enters safemode when it detects 
that the resources are not suffcient. NNRM is only in ANN. If both ANN and SNN 
enter SM due to low resources, and later SNN's disk space is restored, SNN 
willl become ANN and ANN will become SNN. However, at this point, SNN will not 
exit the SM, even if the disk is recovered.
   
   Consider the following scenario:
   
   - Initially, nn-1 is active and nn-2 is standby. The insufficient resources 
of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
detects the resource issue and puts nn01 into safemode.
   - At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
safemode (OFF) and standby.
   - After a period of time, the resources in nn-2's dfs.namenode.name.dir 
recover, triggering failover.
   - Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode 
(OFF) and active.
   - Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
   - However, since nn-1 is standby but in safemode (ON), it unable to exit 
safe mode automatically.
   
   If SNN is detected to be in SM(because low resource), it will exit. 
   
   ### How was this patch tested?
   Test in a production environment
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org