UENISHI Kota created HDDS-6465:
----------------------------------

             Summary: Measure and expose DU metrics
                 Key: HDDS-6465
                 URL: https://issues.apache.org/jira/browse/HDDS-6465
             Project: Apache Ozone
          Issue Type: Improvement
          Components: Ozone Datanode
    Affects Versions: 1.2.0
            Reporter: UENISHI Kota


We need metrics about du running stats like this;


{noformat}
# HELP total count of du started per data directory
du_started_count\{path="/ozone/data/storage1", node="node1.example.com"} 234
# HELP total count of du done per data directory
du_finished_count\{path="/ozone/data/storage1", node="node1.example.com"} 233
# HELP du latency in total (milli)seconds
du_latency_time \{path="/ozone/data/storage1", node="node1.example.com"} 
123423e+10
{noformat}


Datanodes run du command to measure observe disk usage by block files. Besides, 
it could be fairly heavy load to disk device due to the recursive nature of du 
command, especially in case block files are relatively small (e.g. the small 
file problem in local file systems). du itself is not that heavy load alone, 
but in case when it overlaps with container scan tasks, it is relatively hard 
to observe du is an additional load to the disk. (The default interval of 
container metadata scan is 3h and du interval is 1h - I already changed them in 
our environment).

We can't observe du load easily, until we log in to the datanode and hit "top" 
or whatever, or [the log level be in 
debug|https://github.com/apache/ozone/blob/ozone-1.2.1/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/fs/AbstractSpaceUsageSource.java#L60].
 The log level should be in INFO IMO.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to