[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=796559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796559
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 30/Jul/22 02:30
Start Date: 30/Jul/22 02:30
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r933715281


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -537,35 +547,34 @@ public int getNumEnteringMaintenanceDataNodes() {
 
   @Override // NameNodeMXBean
   public String getNodeUsage() {
-float median = 0;
-float max = 0;
-float min = 0;
-float dev = 0;
+double median = 0;
+double max = 0;
+double min = 0;
+double dev = 0;
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
+  DatanodeInfo[] live = null;
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE, false, 
timeOut);
+  } else {
+LOG.debug("Getting node usage is disabled.");
+  }
 
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
+  if (live != null && live.length > 0) {
+double[] usages = new double[live.length];
 int i = 0;
 for (DatanodeInfo dn : live) {
   usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
 }
-totalDfsUsed /= live.length;
 Arrays.sort(usages);
 median = usages[usages.length / 2];
 max = usages[usages.length - 1];
 min = usages[0];
 
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
-}
-dev = (float) Math.sqrt(dev / usages.length);
+StandardDeviation deviation = new StandardDeviation();
+dev = deviation.evaluate(usages);
   }
 } catch (IOException e) {
   LOG.error("Cannot get the live nodes: {}", e.getMessage());

Review Comment:
   Thanks @slfan1989 @goiri for your review.  I think `e.getMessage()` is 
enough. @slfan1989 Do you have some cases that need the full stack? 





Issue Time Tracking
---

Worklog Id: (was: 796559)
Time Spent: 2h  (was: 1h 50m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=796166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796166
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 28/Jul/22 18:52
Start Date: 28/Jul/22 18:52
Worklog Time Spent: 10m 
  Work Description: goiri commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r932570164


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -537,35 +547,34 @@ public int getNumEnteringMaintenanceDataNodes() {
 
   @Override // NameNodeMXBean
   public String getNodeUsage() {
-float median = 0;
-float max = 0;
-float min = 0;
-float dev = 0;
+double median = 0;
+double max = 0;
+double min = 0;
+double dev = 0;
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
+  DatanodeInfo[] live = null;
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE, false, 
timeOut);
+  } else {
+LOG.debug("Getting node usage is disabled.");
+  }
 
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
+  if (live != null && live.length > 0) {
+double[] usages = new double[live.length];
 int i = 0;
 for (DatanodeInfo dn : live) {
   usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
 }
-totalDfsUsed /= live.length;
 Arrays.sort(usages);
 median = usages[usages.length / 2];
 max = usages[usages.length - 1];
 min = usages[0];
 
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
-}
-dev = (float) Math.sqrt(dev / usages.length);
+StandardDeviation deviation = new StandardDeviation();
+dev = deviation.evaluate(usages);
   }
 } catch (IOException e) {
   LOG.error("Cannot get the live nodes: {}", e.getMessage());

Review Comment:
   > I feel it would be better this way.
   > 
   > ```
   > LOG.error("Cannot get the live nodes.", e).
   > ```
   
   Do we want to have the full stack trace? I think it is pretty clear what the 
error is here without it.





Issue Time Tracking
---

Worklog Id: (was: 796166)
Time Spent: 1h 50m  (was: 1h 40m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=795936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795936
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 28/Jul/22 05:31
Start Date: 28/Jul/22 05:31
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r931795987


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -537,35 +547,34 @@ public int getNumEnteringMaintenanceDataNodes() {
 
   @Override // NameNodeMXBean
   public String getNodeUsage() {
-float median = 0;
-float max = 0;
-float min = 0;
-float dev = 0;
+double median = 0;
+double max = 0;
+double min = 0;
+double dev = 0;
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
+  DatanodeInfo[] live = null;
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE, false, 
timeOut);
+  } else {
+LOG.debug("Getting node usage is disabled.");
+  }
 
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
+  if (live != null && live.length > 0) {
+double[] usages = new double[live.length];
 int i = 0;
 for (DatanodeInfo dn : live) {
   usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
 }
-totalDfsUsed /= live.length;
 Arrays.sort(usages);
 median = usages[usages.length / 2];
 max = usages[usages.length - 1];
 min = usages[0];
 
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
-}
-dev = (float) Math.sqrt(dev / usages.length);
+StandardDeviation deviation = new StandardDeviation();
+dev = deviation.evaluate(usages);
   }
 } catch (IOException e) {
   LOG.error("Cannot get the live nodes: {}", e.getMessage());

Review Comment:
   I feel it would be better this way.
   
   ```
   LOG.error("Cannot get the live nodes.", e).
   ```





Issue Time Tracking
---

Worklog Id: (was: 795936)
Time Spent: 1h 40m  (was: 1.5h)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=795860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795860
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 28/Jul/22 00:45
Start Date: 28/Jul/22 00:45
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1197525467

   @goiri Hi, master, can you help me merge it into the trunk?




Issue Time Tracking
---

Worklog Id: (was: 795860)
Time Spent: 1.5h  (was: 1h 20m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=794614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794614
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 24/Jul/22 08:54
Start Date: 24/Jul/22 08:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1193275454

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 39s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  22m 16s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 120m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4606 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux fadcae59afbb 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 287868a54232c45a03b5f32b9d9ecc084419d585 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/3/testReport/ |
   | Max. process+thread count | 2808 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 

[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=794595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794595
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 24/Jul/22 04:27
Start Date: 24/Jul/22 04:27
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1193244409

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 58s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 50s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 25s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  41m  5s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 156m 21s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4606 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 911f56c609dd 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e71c527a2f1dae800980e1da5694914a5fd0a93a |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 

[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=794584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794584
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 24/Jul/22 01:50
Start Date: 24/Jul/22 01:50
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1193226247

   @goiri Sir, I have updated the patch and added some UTs, please help me 
review it again. Thanks




Issue Time Tracking
---

Worklog Id: (was: 794584)
Time Spent: 1h  (was: 50m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793961
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 22:37
Start Date: 21/Jul/22 22:37
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r927154231


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
-
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
-int i = 0;
-for (DatanodeInfo dn : live) {
-  usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
-}
-totalDfsUsed /= live.length;
-Arrays.sort(usages);
-median = usages[usages.length / 2];
-max = usages[usages.length - 1];
-min = usages[0];
-
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+DatanodeInfo[] live = rpcServer.getDatanodeReport(
+DatanodeReportType.LIVE, false, timeOut);
+
+if (live.length > 0) {
+  float totalDfsUsed = 0;
+  float[] usages = new float[live.length];
+  int i = 0;
+  for (DatanodeInfo dn : live) {
+usages[i++] = dn.getDfsUsedPercent();

Review Comment:
   Yes,  `rpcServer.getDatanodeReport()` is expensive.  As the number of DNs or 
downstream nameservices in the cluster increases, it will become more and more 
expensive. such as 1w+ DNs, 5w+ DNs, 20+ NSs, 50+ NSs.





Issue Time Tracking
---

Worklog Id: (was: 793961)
Time Spent: 50m  (was: 40m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793959
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 22:37
Start Date: 21/Jul/22 22:37
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r927153919


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java:
##
@@ -315,6 +315,9 @@ public class RBFConfigKeys extends 
CommonConfigurationKeysPublic {
   FEDERATION_ROUTER_PREFIX + "dn-report.cache-expire";
   public static final long DN_REPORT_CACHE_EXPIRE_MS_DEFAULT =
   TimeUnit.SECONDS.toMillis(10);
+  public static final String DFS_ROUTER_ENABLE_GET_DN_USAGE_KEY =
+  FEDERATION_ROUTER_PREFIX + "enable.get.dn.usage";
+  public static final boolean DFS_ROUTER_ENABLE_GET_DN_USAGE_DEFAULT = true;

Review Comment:
   Copy, I will do it.





Issue Time Tracking
---

Worklog Id: (was: 793959)
Time Spent: 40m  (was: 0.5h)

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793863
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 17:31
Start Date: 21/Jul/22 17:31
Worklog Time Spent: 10m 
  Work Description: goiri commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r926936268


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
-
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
-int i = 0;
-for (DatanodeInfo dn : live) {
-  usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
-}
-totalDfsUsed /= live.length;
-Arrays.sort(usages);
-median = usages[usages.length / 2];
-max = usages[usages.length - 1];
-min = usages[0];
-
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+  if (this.enableGetDNUsage) {

Review Comment:
   I would do:
   ```
   DatanodeInfo[] live = null;
   if (this.enableGetDNUsage) {
 RouterRpcServer rpcServer = this.router.getRpcServer();
 DatanodeInfo[] live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE, 
false, timeOut);
   } else {
 LOG.debug("Getting information is disabled."); // similar message
   }
   if (live != null && live.length > 0) {
   ```



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
-
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
-int i = 0;
-for (DatanodeInfo dn : live) {
-  usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
-}
-totalDfsUsed /= live.length;
-Arrays.sort(usages);
-median = usages[usages.length / 2];
-max = usages[usages.length - 1];
-min = usages[0];
-
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+DatanodeInfo[] live = rpcServer.getDatanodeReport(
+DatanodeReportType.LIVE, false, timeOut);
+
+if (live.length > 0) {
+  float totalDfsUsed = 0;
+  float[] usages = new float[live.length];
+  int i = 0;
+  for (DatanodeInfo dn : live) {
+usages[i++] = dn.getDfsUsedPercent();

Review Comment:
   What is the expensive part of this whole block? this?



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
 final Map> info = new HashMap<>();
 try {
-  RouterRpcServer rpcServer = this.router.getRpcServer();
-  DatanodeInfo[] live = rpcServer.getDatanodeReport(
-  DatanodeReportType.LIVE, false, timeOut);
-
-  if (live.length > 0) {
-float totalDfsUsed = 0;
-float[] usages = new float[live.length];
-int i = 0;
-for (DatanodeInfo dn : live) {
-  usages[i++] = dn.getDfsUsedPercent();
-  totalDfsUsed += dn.getDfsUsedPercent();
-}
-totalDfsUsed /= live.length;
-Arrays.sort(usages);
-median = usages[usages.length / 2];
-max = usages[usages.length - 1];
-min = usages[0];
-
-for (i = 0; i < usages.length; i++) {
-  dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+  if (this.enableGetDNUsage) {
+RouterRpcServer rpcServer = this.router.getRpcServer();
+DatanodeInfo[] live = rpcServer.getDatanodeReport(
+DatanodeReportType.LIVE, false, timeOut);
+
+if (live.length > 0) {
+  float totalDfsUsed = 0;
+  float[] usages = new float[live.length];
+  int i = 0;
+  for (DatanodeInfo dn : live) {
+usages[i++] = dn.getDfsUsedPercent();
+

[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793838
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 16:59
Start Date: 21/Jul/22 16:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1191727346

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 22s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 34s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  39m 50s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 147m 35s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4606 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 843009c68092 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4de60fb87f2f25087acab7b90d75cd0e20be622d |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4606/1/testReport/ |
   | Max. process+thread count | 2025 (vs. ulimit of 

[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793769
 ]

ASF GitHub Bot logged work on HDFS-16678:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 14:31
Start Date: 21/Jul/22 14:31
Worklog Time Spent: 10m 
  Work Description: ZanderXu opened a new pull request, #4606:
URL: https://github.com/apache/hadoop/pull/4606

   ### Description of PR
   In our prod environment, we try to collect RBF metrics every 15s through 
jmx_exporter. And we found that collection task often failed. 
   
   After tracing and found that the collection task is blocked at 
getNodeUsage() in RBFMetrics, because it will collect all datanode's usage from 
downstream nameservices.  
   
   This is a very expensive and almost useless operation. Because in most 
scenarios, each downstream nameserivce contains almost the same DNs. We can get 
the data usage's from any one nameservices if need, not from RBF.
   
   So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.
   
   




Issue Time Tracking
---

Worklog Id: (was: 793769)
Remaining Estimate: 0h
Time Spent: 10m

> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org