[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=614836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614836
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 05:08
Start Date: 25/Jun/21 05:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-868207854


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  11m 22s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  19m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  25m 12s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   8m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  35m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  47m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  20m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 15s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/6/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   3m 49s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  21m 41s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   7m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   8m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  34m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  46m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 780m 30s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/6/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | -1 :x: |  asflicense  |   1m 41s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/6/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 1108m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.tools.dynamometer.TestDynamometerInfra |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.server.federation.router.TestRouterClientRejectOverload |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3086 |
   | Optional Tests | dupname asflicense codespell compile javac javadoc 
mvninstall mvnsite unit shadedclient spotbugs checkstyle |
   | uname | Linux cc2a45cce6c8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=614392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614392
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:07
Start Date: 24/Jun/21 09:07
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r657769249



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java
##
@@ -1757,7 +1757,7 @@ public void testRBFMetricsMethodsRelayOnStateStore() {
 // These methods relays on
 // {@link RBFMetrics#getActiveNamenodeRegistration()}
 assertEquals("{}", metrics.getNameservices());
-assertEquals(0, metrics.getNumLiveNodes());
+assertEquals(NUM_DNS * 2, metrics.getNumLiveNodes());

Review comment:
   GetNumLiveNodes used to obtain dn information through StateStore, and 
now it obtains it through RouterRpcServer#getCachedDatanodeReport. Now this 
test is not applicable in StateStore, I deleted it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614392)
Time Spent: 2.5h  (was: 2h 20m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=614384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614384
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:02
Start Date: 24/Jun/21 09:02
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r657765228



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRBFMetrics.java
##
@@ -382,4 +366,56 @@ private void testCapacity(FederationMBean bean) throws 
IOException {
 assertNotEquals(availableCapacity,
 BigInteger.valueOf(bean.getRemainingCapacity()));
   }
+
+  @Test
+  public void testDatanodeNumMetrics()
+  throws Exception {
+Configuration routerConf = new RouterConfigBuilder()
+.metrics()
+.http()
+.stateStore()
+.rpc()
+.build();
+MiniRouterDFSCluster cluster = new MiniRouterDFSCluster(false, 1);
+cluster.setNumDatanodesPerNameservice(0);
+cluster.addNamenodeOverrides(routerConf);
+cluster.startCluster();
+routerConf.setTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, 1, TimeUnit.SECONDS);
+cluster.addRouterOverrides(routerConf);
+cluster.startRouters();
+Router router = cluster.getRandomRouter().getRouter();
+// Register and verify all NNs with all routers
+cluster.registerNamenodes();
+cluster.waitNamenodeRegistration();
+RouterRpcServer rpcServer = router.getRpcServer();
+RBFMetrics rbfMetrics = router.getMetrics();
+// Create mock dn
+DatanodeInfo[] dNInfo = new DatanodeInfo[4];
+DatanodeInfo datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.setDecommissioned();
+dNInfo[0] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.setInMaintenance();
+dNInfo[1] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.startMaintenance();
+dNInfo[2] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.startDecommission();
+dNInfo[3] = datanodeInfo;
+
+rpcServer.getDnCache().put(HdfsConstants.DatanodeReportType.LIVE, dNInfo);

Review comment:
   Thanks for your reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614384)
Time Spent: 2h 20m  (was: 2h 10m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=614063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614063
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 23/Jun/21 15:13
Start Date: 23/Jun/21 15:13
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r657203156



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRBFMetrics.java
##
@@ -382,4 +366,56 @@ private void testCapacity(FederationMBean bean) throws 
IOException {
 assertNotEquals(availableCapacity,
 BigInteger.valueOf(bean.getRemainingCapacity()));
   }
+
+  @Test
+  public void testDatanodeNumMetrics()
+  throws Exception {
+Configuration routerConf = new RouterConfigBuilder()
+.metrics()
+.http()
+.stateStore()
+.rpc()
+.build();
+MiniRouterDFSCluster cluster = new MiniRouterDFSCluster(false, 1);
+cluster.setNumDatanodesPerNameservice(0);
+cluster.addNamenodeOverrides(routerConf);
+cluster.startCluster();
+routerConf.setTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, 1, TimeUnit.SECONDS);
+cluster.addRouterOverrides(routerConf);
+cluster.startRouters();
+Router router = cluster.getRandomRouter().getRouter();
+// Register and verify all NNs with all routers
+cluster.registerNamenodes();
+cluster.waitNamenodeRegistration();
+RouterRpcServer rpcServer = router.getRpcServer();
+RBFMetrics rbfMetrics = router.getMetrics();
+// Create mock dn
+DatanodeInfo[] dNInfo = new DatanodeInfo[4];
+DatanodeInfo datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.setDecommissioned();
+dNInfo[0] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.setInMaintenance();
+dNInfo[1] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.startMaintenance();
+dNInfo[2] = datanodeInfo;
+datanodeInfo = new DatanodeInfo.DatanodeInfoBuilder().build();
+datanodeInfo.startDecommission();
+dNInfo[3] = datanodeInfo;
+
+rpcServer.getDnCache().put(HdfsConstants.DatanodeReportType.LIVE, dNInfo);

Review comment:
   This is a little unconventional.
   You should mark the getter as VisibleForTesting.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -164,13 +163,13 @@ public RBFMetrics(Router router) throws IOException {
   RouterStore.class);
 }
 
-// Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+
+// Use RpcServer dnCache
+this.dnCache = this.router.getRpcServer().getDnCache();

Review comment:
   No much benefit getting and setting into an attribute.
   We can do this get the times we need to access.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java
##
@@ -1757,7 +1757,7 @@ public void testRBFMetricsMethodsRelayOnStateStore() {
 // These methods relays on
 // {@link RBFMetrics#getActiveNamenodeRegistration()}
 assertEquals("{}", metrics.getNameservices());
-assertEquals(0, metrics.getNumLiveNodes());
+assertEquals(NUM_DNS * 2, metrics.getNumLiveNodes());

Review comment:
   Why now this is like this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614063)
Time Spent: 2h 10m  (was: 2h)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=613816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613816
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 23/Jun/21 06:04
Start Date: 23/Jun/21 06:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-866554573


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 44s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  25m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 54s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   7m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  34m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  46m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  21m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  22m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 27s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/5/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   3m 47s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  20m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   7m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   8m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  35m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  47m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 794m 42s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/5/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | -1 :x: |  asflicense  |   1m 25s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/5/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 1123m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
   |   | hadoop.yarn.server.router.clientrm.TestFederationClientInterceptor |
   |   | hadoop.tools.dynamometer.TestDynamometerInfra |
   |   | hadoop.hdfs.server.federation.router.TestRouter |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.router.TestRouterRPCClientRetries |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.server.federation.router.TestRouterClientRejectOverload |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/5/artifact/out/Dockerfile
 |
   | GITHUB PR | 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=613407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613407
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 10:37
Start Date: 22/Jun/21 10:37
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r650121325



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   Yes,They should use the same dncache. In addition, I want to extract 
NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they 
should be serialized to StateStore and then de-serialized to be used by 
RBFMetrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 613407)
Time Spent: 1h 50m  (was: 1h 40m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610275
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:34
Start Date: 14/Jun/21 07:34
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r649727373



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   `  private void updateJMXParameters(
 String address, NamenodeStatusReport report) {
   try {
 // TODO part of this should be moved to its own utility
 getFsNamesystemMetrics(address, report);
 getNamenodeInfoMetrics(address, report);
   } catch (Exception e) {
 LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
   }
 }`

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   ```
   private void updateJMXParameters(
 String address, NamenodeStatusReport report) {
   try {
 // TODO part of this should be moved to its own utility
 getFsNamesystemMetrics(address, report);
 getNamenodeInfoMetrics(address, report);
   } catch (Exception e) {
 LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
   }
 }
   ```

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610208
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:27
Start Date: 14/Jun/21 07:27
Worklog Time Spent: 10m 
  Work Description: base111 commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r650119510



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   Yes,They should use the same dncache. In addition, I want to extract 
NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they 
should be serialized to StateStore and then de-serialized to be used by 
RBFMetrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610208)
Time Spent: 1.5h  (was: 1h 20m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610115
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:18
Start Date: 14/Jun/21 07:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-859227661


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 50s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   4m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  27m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   8m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   8m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  35m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  48m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  21m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 43s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   3m 46s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |  21m  4s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   7m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   7m 46s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  34m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  46m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 780m 40s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | -1 :x: |  asflicense  |   1m 32s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 1117m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.router.clientrm.TestFederationClientInterceptor |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.metrics.TestRBFMetrics |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610095
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:16
Start Date: 14/Jun/21 07:16
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r650097026



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   I just want to avoid having two caches of the same thing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610095)
Time Spent: 1h 10m  (was: 1h)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609955
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 21:46
Start Date: 10/Jun/21 21:46
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r649553648



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   getNodeUsage() already uses it, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609955)
Time Spent: 1h  (was: 50m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609954
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 21:44
Start Date: 10/Jun/21 21:44
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r649552979



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   RouterRpcServer has a similar cache, can we use that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609954)
Time Spent: 50m  (was: 40m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609560
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:32
Start Date: 10/Jun/21 08:32
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r648969521



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long getNameserviceAggregatedLong(
+  DatanodeReportType type, ToLongFunction f){
+long size = 0;
+try {
+  size = Arrays.stream(dnCache.get(type)).mapToLong(f).sum();

Review comment:
   Thank you for your reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609560)
Time Spent: 40m  (was: 0.5h)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609363
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 20:10
Start Date: 09/Jun/21 20:10
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r648647841



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long getNameserviceAggregatedLong(
+  DatanodeReportType type, ToLongFunction f){
+long size = 0;
+try {
+  size = Arrays.stream(dnCache.get(type)).mapToLong(f).sum();

Review comment:
   Extract the get(type)?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long getNameserviceAggregatedLong(
+  DatanodeReportType type, ToLongFunction f){
+long size = 0;
+try {
+  size = Arrays.stream(dnCache.get(type)).mapToLong(f).sum();
+} catch (ExecutionException e) {
+  LOG.debug("Cannot get " + type + " nodes", e.getMessage());
+}
+return size;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated Integer.
+   */
+  public int getNameserviceAggregatedInt(
+  DatanodeReportType type, Predicate f){
+int size = 0;
+try {
+  Arrays.stream(dnCache.get(DatanodeReportType.LIVE)).filter(f).count();

Review comment:
   We are not updating size at all, are we?
   It is also not the most intuitive code to read; maybe extract a litle.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/NamenodeBeanMetrics.java
##
@@ -500,6 +498,8 @@ private String getNodesImpl(final DatanodeReportType type) {
   LOG.error("Cannot get {} nodes, subclusters timed out responding", type);
 } catch (IOException e) {
   LOG.error("Cannot get " + type + " nodes", e);
+} catch (ExecutionException e) {
+  LOG.error("Cannot get " + type + " nodes", e);

Review comment:
   Do we support logger {}?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {

Review comment:
   Add a javadoc explaining the purpose.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609034
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 08:58
Start Date: 09/Jun/21 08:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-857518062


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 17s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  20m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 100m 37s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.metrics.TestRBFMetrics |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3086 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 5d8ec9349326 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | 

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=608972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608972
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 07:16
Start Date: 09/Jun/21 07:16
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi opened a new pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086


   Solve the inaccurate statistics of metrics for getNumLiveNodes, 
getNumDeadNodes, getNumDecommissioningNodes, getNumDecomLiveNodes, 
getNumDecomDeadNodes, getNumInMaintenanceLiveDataNodes, 
getNumInMaintenanceDeadDataNodes, getNumEnteringMaintenanceDataNodes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608972)
Remaining Estimate: 0h
Time Spent: 10m

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org