[jira] [Comment Edited] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

Arpit Agarwal (JIRA) Thu, 08 Jun 2017 13:11:40 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043335#comment-16043335
 ]


Arpit Agarwal edited comment on HDFS-11907 at 6/8/17 8:10 PM:
--------------------------------------------------------------

Hi [~andrew.wang], you are right that it's expected to be a cheap call, but 
calling it once per second per volume seems excessive. Do you see any benefit 
to querying {{df}} once per second? We can make the caching interval 
configurable and leave the default at 1 second if you prefer.

This is not the same as changing the health check interval as Chen mentioned. 
Keeping the health check interval at 1 second lets us detect process failure 
faster and we don't want to change that.

Also the v4 patch has a couple of issues I missed earlier. [~vagarychen] can 
you please take a look at these?
# availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
# The test case failure in TestNameNodeResourceChecker needs to be addressed. 
An easy fix is to check all volumes instead of trying to query a specific one.


was (Author: arpitagarwal):
Hi [~andrew.wang], you are right that it's expected to be a cheap call, but 
calling it once per second per volume seems excessive. Do you see any benefit 
to querying {{df}} once per second? We can make the caching interval 
configurable and leave the default at 1 second if you prefer.

This is not the same as changing the health check interval as Chen mentioned. 
Keeping the health check interval at 1 second lets us detect process failure 
faster and we don't want to change that.

Also the v4 patch has a couple of issues I missed earlier.
# availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
# The test case failure in TestNameNodeResourceChecker needs to be addressed. 
An easy fix is to check all volumes instead of trying to query a specific one.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11907
>                 URL: https://issues.apache.org/jira/browse/HDFS-11907
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, 
> HDFS-11907.003.patch, HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes 
> {{NameNode#monitorHealth}} which ends up invoking 
> {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once per 
> second by default. And NameNodeResourceChecker#isResourceAvailable invokes 
> {{df.getAvailable();}} every time it is called.
> Since available space information should rarely be changing dramatically at 
> the pace of per second. A cached value should be sufficient. i.e. only try to 
> get the updated value when the cached value is too old. otherwise simply 
> return the cached value. This way df.getAvailable() gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

Reply via email to