[ https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610275 ]
ASF GitHub Bot logged work on HDFS-16039: ----------------------------------------- Author: ASF GitHub Bot Created on: 14/Jun/21 07:34 Start Date: 14/Jun/21 07:34 Worklog Time Spent: 10m Work Description: zhuxiangyi commented on a change in pull request #3086: URL: https://github.com/apache/hadoop/pull/3086#discussion_r649727373 ########## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ########## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); - this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, - RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); + // Initialize the cache for the DN reports + this.dnReportTimeOut = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_TIME_OUT, + RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); + long dnCacheExpire = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); + this.dnCache = CacheBuilder.newBuilder() Review comment: > RouterRpcServer has a similar cache, can we use that? Yes we can use it. NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us to cache it in RBFMetrics. ` private void updateJMXParameters( String address, NamenodeStatusReport report) { try { // TODO part of this should be moved to its own utility getFsNamesystemMetrics(address, report); getNamenodeInfoMetrics(address, report); } catch (Exception e) { LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e); } }` ########## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ########## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); - this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, - RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); + // Initialize the cache for the DN reports + this.dnReportTimeOut = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_TIME_OUT, + RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); + long dnCacheExpire = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); + this.dnCache = CacheBuilder.newBuilder() Review comment: > RouterRpcServer has a similar cache, can we use that? Yes we can use it. NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us to cache it in RBFMetrics. ``` private void updateJMXParameters( String address, NamenodeStatusReport report) { try { // TODO part of this should be moved to its own utility getFsNamesystemMetrics(address, report); getNamenodeInfoMetrics(address, report); } catch (Exception e) { LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e); } } ``` ########## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ########## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); - this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, - RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); + // Initialize the cache for the DN reports + this.dnReportTimeOut = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_TIME_OUT, + RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); + long dnCacheExpire = conf.getTimeDuration( + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, + RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); + this.dnCache = CacheBuilder.newBuilder() Review comment: Yes,They should use the same dncache. In addition, I want to extract NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they should be serialized to StateStore and then de-serialized to be used by RBFMetrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 610275) Time Spent: 1h 40m (was: 1.5h) > RBF: Some indicators of RBFMetrics count inaccurately > ------------------------------------------------------ > > Key: HDFS-16039 > URL: https://issues.apache.org/jira/browse/HDFS-16039 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf > Affects Versions: 3.4.0 > Reporter: Xiangyi Zhu > Assignee: Xiangyi Zhu > Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity > The current statistical algorithm is to accumulate all Nn indicators, which > will lead to inaccurate counting. I think that the same ClusterID only needs > to take one Max and then do the accumulation. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org