[ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043335#comment-16043335 ]
Arpit Agarwal edited comment on HDFS-11907 at 6/8/17 8:10 PM: -------------------------------------------------------------- Hi [~andrew.wang], you are right that it's expected to be a cheap call, but calling it once per second per volume seems excessive. Do you see any benefit to querying {{df}} once per second? We can make the caching interval configurable and leave the default at 1 second if you prefer. This is not the same as changing the health check interval as Chen mentioned. Keeping the health check interval at 1 second lets us detect process failure faster and we don't want to change that. Also the v4 patch has a couple of issues I missed earlier. [~vagarychen] can you please take a look at these? # availableSpace and availableSpaceTimeStamp should be members of checkedVolume. # The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix is to check all volumes instead of trying to query a specific one. was (Author: arpitagarwal): Hi [~andrew.wang], you are right that it's expected to be a cheap call, but calling it once per second per volume seems excessive. Do you see any benefit to querying {{df}} once per second? We can make the caching interval configurable and leave the default at 1 second if you prefer. This is not the same as changing the health check interval as Chen mentioned. Keeping the health check interval at 1 second lets us detect process failure faster and we don't want to change that. Also the v4 patch has a couple of issues I missed earlier. # availableSpace and availableSpaceTimeStamp should be members of checkedVolume. # The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix is to check all volumes instead of trying to query a specific one. > NameNodeResourceChecker should avoid calling df.getAvailable too frequently > --------------------------------------------------------------------------- > > Key: HDFS-11907 > URL: https://issues.apache.org/jira/browse/HDFS-11907 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Chen Liang > Assignee: Chen Liang > Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, > HDFS-11907.003.patch, HDFS-11907.004.patch > > > Currently, {{HealthMonitor#doHealthChecks}} invokes > {{NameNode#monitorHealth}} which ends up invoking > {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per > second by default. And NameNodeResourceChecker#isResourceAvailable invokes > {{df.getAvailable();}} every time it is called. > Since available space information should rarely be changing dramatically at > the pace of per second. A cached value should be sufficient. i.e. only try to > get the updated value when the cached value is too old. otherwise simply > return the cached value. This way df.getAvailable() gets invoked less. > Thanks [~arpitagarwal] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org