[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

ASF GitHub Bot (Jira) Mon, 14 Jun 2021 00:35:13 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610275
 ]


ASF GitHub Bot logged work on HDFS-16039:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jun/21 07:34
            Start Date: 14/Jun/21 07:34
    Worklog Time Spent: 10m 
      Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r649727373



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##########
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
     // Initialize the cache for the DN reports
     Configuration conf = router.getConfig();
-    this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
     this.topTokenRealOwners = conf.getInt(
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+    // Initialize the cache for the DN reports
+    this.dnReportTimeOut = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_TIME_OUT,
+        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+    long dnCacheExpire = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+    this.dnCache = CacheBuilder.newBuilder()

Review comment:
       > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   `  private void updateJMXParameters(
         String address, NamenodeStatusReport report) {
       try {
         // TODO part of this should be moved to its own utility
         getFsNamesystemMetrics(address, report);
         getNamenodeInfoMetrics(address, report);
       } catch (Exception e) {
         LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
       }
     }`

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##########
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
     // Initialize the cache for the DN reports
     Configuration conf = router.getConfig();
-    this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
     this.topTokenRealOwners = conf.getInt(
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+    // Initialize the cache for the DN reports
+    this.dnReportTimeOut = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_TIME_OUT,
+        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+    long dnCacheExpire = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+    this.dnCache = CacheBuilder.newBuilder()

Review comment:
       > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   ```
   private void updateJMXParameters(
         String address, NamenodeStatusReport report) {
       try {
         // TODO part of this should be moved to its own utility
         getFsNamesystemMetrics(address, report);
         getNamenodeInfoMetrics(address, report);
       } catch (Exception e) {
         LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
       }
     }
   ```

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##########
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
     // Initialize the cache for the DN reports
     Configuration conf = router.getConfig();
-    this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
     this.topTokenRealOwners = conf.getInt(
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
         RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+    // Initialize the cache for the DN reports
+    this.dnReportTimeOut = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_TIME_OUT,
+        RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+    long dnCacheExpire = conf.getTimeDuration(
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+        RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+    this.dnCache = CacheBuilder.newBuilder()

Review comment:
       Yes,They should use the same dncache. In addition, I want to extract 
NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they 
should be serialized to StateStore and then de-serialized to be used by 
RBFMetrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 610275)
    Time Spent: 1h 40m  (was: 1.5h)

> RBF:  Some indicators of RBFMetrics count inaccurately
> ------------------------------------------------------
>
>                 Key: HDFS-16039
>                 URL: https://issues.apache.org/jira/browse/HDFS-16039
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rbf
>    Affects Versions: 3.4.0
>            Reporter: Xiangyi Zhu
>            Assignee: Xiangyi Zhu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

Reply via email to