[jira] [Created] (HDFS-17057) Add DataNode maintenance states to Federation UI

2023-06-22 Thread Haiyang Hu (Jira)
Haiyang Hu created HDFS-17057:
-

 Summary: Add DataNode maintenance states to Federation UI 
 Key: HDFS-17057
 URL: https://issues.apache.org/jira/browse/HDFS-17057
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haiyang Hu


Add DataNode maintenance states to Federation UI 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17057) Add DataNode maintenance states to Federation UI

2023-06-22 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu reassigned HDFS-17057:
-

Assignee: Haiyang Hu

> Add DataNode maintenance states to Federation UI 
> -
>
> Key: HDFS-17057
> URL: https://issues.apache.org/jira/browse/HDFS-17057
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>
> Add DataNode maintenance states to Federation UI 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736329#comment-17736329
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

hadoop-yetus commented on PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#issuecomment-1603488495

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 17s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m  4s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  31m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  30m 34s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 248m  2s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 381m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5764 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 383a9054d10f 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ba9e87f2b294deb1cd67100e1c39e317ffb76295 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/3/testReport/ |
   | Max. process+thread count | 2108 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736314#comment-17736314
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

hadoop-yetus commented on PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#issuecomment-1603458394

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 59s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 12s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 219m 11s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 329m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5764 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f1712d7a704a 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ba9e87f2b294deb1cd67100e1c39e317ffb76295 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/4/testReport/ |
   | Max. process+thread count | 3144 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Export HAState 

[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736312#comment-17736312
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

xinglin commented on code in PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#discussion_r1239137137


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2051,6 +2056,26 @@ synchronized HAServiceState getServiceState() {
 return state.getServiceState();
   }
 
+  /**
+   * Emit Namenode HA service state as an integer so that one can monitor NN HA
+   * state based on this metric.
+   *
+   * @return  0 when not fully started
+   *  1 for active or standalone (non-HA) NN
+   *  2 for standby
+   *  3 for observer
+   *

Review Comment:
   Searching codebase, it seems we would set a state to STOPPING state only in 
YARN ResourceManager HA. We are not using that state in HDFS.





> Export HAState as a metric from Namenode for monitoring
> ---
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>
> We'd like measure the uptime for Namenodes: percentage of time when we have 
> the active/standby/observer node available (up and running). We could monitor 
> the namenode from an external service, such as ZKFC. But that would require 
> the external service to be available 100% itself. And when this third-party 
> external monitoring service is down, we won't have info on whether our 
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly 
> from namenode itself. Whenever we miss a data point for this metric, we 
> consider the corresponding namenode to be down/not available. In other words, 
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, 
> which is currently used to emit all metrics for {_}NameNode.java{_}. However, 
> we don't think that is a good place to emit NameNode HAState. HAState is 
> stored in NameNode.java and we should directly emit it from NameNode.java. 
> Otherwise, we basically duplicate this info in two classes and we would have 
> to keep them in sync. Besides, _NameNodeMetrics_ class does not have a 
> reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ 
> is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}.
> We shouldn't emit HA state from FSNameSystem.java either, as it is 
> initialized from NameNode.java and all state transitions are implemented in 
> NameNode.java.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736309#comment-17736309
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

melissayou commented on code in PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#discussion_r1239128912


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2051,6 +2056,26 @@ synchronized HAServiceState getServiceState() {
 return state.getServiceState();
   }
 
+  /**
+   * Emit Namenode HA service state as an integer so that one can monitor NN HA
+   * state based on this metric.
+   *
+   * @return  0 when not fully started
+   *  1 for active or standalone (non-HA) NN
+   *  2 for standby
+   *  3 for observer
+   *

Review Comment:
   I saw HAState has a stopping enum. We won't encounter that state?





> Export HAState as a metric from Namenode for monitoring
> ---
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>
> We'd like measure the uptime for Namenodes: percentage of time when we have 
> the active/standby/observer node available (up and running). We could monitor 
> the namenode from an external service, such as ZKFC. But that would require 
> the external service to be available 100% itself. And when this third-party 
> external monitoring service is down, we won't have info on whether our 
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly 
> from namenode itself. Whenever we miss a data point for this metric, we 
> consider the corresponding namenode to be down/not available. In other words, 
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, 
> which is currently used to emit all metrics for {_}NameNode.java{_}. However, 
> we don't think that is a good place to emit NameNode HAState. HAState is 
> stored in NameNode.java and we should directly emit it from NameNode.java. 
> Otherwise, we basically duplicate this info in two classes and we would have 
> to keep them in sync. Besides, _NameNodeMetrics_ class does not have a 
> reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ 
> is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}.
> We shouldn't emit HA state from FSNameSystem.java either, as it is 
> initialized from NameNode.java and all state transitions are implemented in 
> NameNode.java.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736289#comment-17736289
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

hadoop-yetus commented on PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#issuecomment-1603360305

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  8s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 37s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 72 unchanged - 
0 fixed = 73 total (was 72)  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 260m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 378m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSUpgrade |
   |   | hadoop.hdfs.server.namenode.TestBackupNode |
   |   | hadoop.hdfs.TestDFSRollback |
   |   | hadoop.hdfs.server.namenode.TestNameNodeMetricsLogger |
   |   | hadoop.hdfs.TestDFSFinalize |
   |   | hadoop.hdfs.TestHDFSServerPorts |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5764/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5764 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 6ec9fcc6fcf3 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3ede4ef8d2206de23aa992ead2d19baeb4ebf88a |
   | Default Java | Private 

[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736248#comment-17736248
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

xinglin commented on PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#issuecomment-1603146650

   Hi @goiri,
   
   Summary of changes to fix unit test failures. 
   - Make two subclasses of NameNode as metric source as well, by adding a 
`@Metrics` annotation, since we have made NameNode a metric source.  
   - Fixed `getCurrentBlockPoolID() `bug in a couple of unit tests. We set 
`NameNode.started ` flag to `false` in `NameNode.stop()` method. However, if we 
pass a `cluster` object to `getCurrentBlockPoolID`(), it will check to ensure 
the NameNode is started. Fixed it by passing a `null` to 
`getCurrentBlockPoolID()`, which is what existing code is doing as well.
   
   TestRollingUpgrade passed all tests on my laptop. Let's see how the build 
goes.




> Export HAState as a metric from Namenode for monitoring
> ---
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>
> We'd like measure the uptime for Namenodes: percentage of time when we have 
> the active/standby/observer node available (up and running). We could monitor 
> the namenode from an external service, such as ZKFC. But that would require 
> the external service to be available 100% itself. And when this third-party 
> external monitoring service is down, we won't have info on whether our 
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly 
> from namenode itself. Whenever we miss a data point for this metric, we 
> consider the corresponding namenode to be down/not available. In other words, 
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, 
> which is currently used to emit all metrics for {_}NameNode.java{_}. However, 
> we don't think that is a good place to emit NameNode HAState. HAState is 
> stored in NameNode.java and we should directly emit it from NameNode.java. 
> Otherwise, we basically duplicate this info in two classes and we would have 
> to keep them in sync. Besides, _NameNodeMetrics_ class does not have a 
> reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ 
> is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}.
> We shouldn't emit HA state from FSNameSystem.java either, as it is 
> initialized from NameNode.java and all state transitions are implemented in 
> NameNode.java.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17055) Export HAState as a metric from Namenode for monitoring

2023-06-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736241#comment-17736241
 ] 

ASF GitHub Bot commented on HDFS-17055:
---

xinglin commented on PR #5764:
URL: https://github.com/apache/hadoop/pull/5764#issuecomment-1603105542

   Hi @goiri,
   
   thanks for taking a look at this PR and approving it. Please don't merge it 
yet. I am still working on fixing some unit test failures.




> Export HAState as a metric from Namenode for monitoring
> ---
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
>
> We'd like measure the uptime for Namenodes: percentage of time when we have 
> the active/standby/observer node available (up and running). We could monitor 
> the namenode from an external service, such as ZKFC. But that would require 
> the external service to be available 100% itself. And when this third-party 
> external monitoring service is down, we won't have info on whether our 
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly 
> from namenode itself. Whenever we miss a data point for this metric, we 
> consider the corresponding namenode to be down/not available. In other words, 
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, 
> which is currently used to emit all metrics for {_}NameNode.java{_}. However, 
> we don't think that is a good place to emit NameNode HAState. HAState is 
> stored in NameNode.java and we should directly emit it from NameNode.java. 
> Otherwise, we basically duplicate this info in two classes and we would have 
> to keep them in sync. Besides, _NameNodeMetrics_ class does not have a 
> reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ 
> is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}.
> We shouldn't emit HA state from FSNameSystem.java either, as it is 
> initialized from NameNode.java and all state transitions are implemented in 
> NameNode.java.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param

2023-06-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736174#comment-17736174
 ] 

Ayush Saxena commented on HDFS-17056:
-

Found while trying 3.3.6 RC, minor stuff should be present in trunk as well...

A typical fix would be to add a simple if check like other commands in the 
verifyClusterOutput command, something like this & things should work
{code:java}
throw e;
  }
} else {
  if (args.size() > 0) {
System.err.println(getName() + ": Too many arguments");
return 1;
  }
  result = dfs.getECTopologyResultForPolicies(); {code}

> EC: Fix verifyClusterSetup output in case of an invalid param
> -
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Ayush Saxena
>Priority: Major
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param

2023-06-22 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-17056:
---

 Summary: EC: Fix verifyClusterSetup output in case of an invalid 
param
 Key: HDFS-17056
 URL: https://issues.apache.org/jira/browse/HDFS-17056
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ec
Reporter: Ayush Saxena


{code:java}
bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
XOR-2-1-1024k. The number of DataNodes is only 3. {code}
verifyClusterSetup requires -policy then the name of policies, else it defaults 
to all enabled policies.

In case there are additional invalid options it silently ignores them, unlike 
other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org