UENISHI Kota created HDDS-6465:
----------------------------------
Summary: Measure and expose DU metrics
Key: HDDS-6465
URL: https://issues.apache.org/jira/browse/HDDS-6465
Project: Apache Ozone
Issue Type: Improvement
Components: Ozone Datanode
Affects Versions: 1.2.0
Reporter: UENISHI Kota
We need metrics about du running stats like this;
{noformat}
# HELP total count of du started per data directory
du_started_count\{path="/ozone/data/storage1", node="node1.example.com"} 234
# HELP total count of du done per data directory
du_finished_count\{path="/ozone/data/storage1", node="node1.example.com"} 233
# HELP du latency in total (milli)seconds
du_latency_time \{path="/ozone/data/storage1", node="node1.example.com"}
123423e+10
{noformat}
Datanodes run du command to measure observe disk usage by block files. Besides,
it could be fairly heavy load to disk device due to the recursive nature of du
command, especially in case block files are relatively small (e.g. the small
file problem in local file systems). du itself is not that heavy load alone,
but in case when it overlaps with container scan tasks, it is relatively hard
to observe du is an additional load to the disk. (The default interval of
container metadata scan is 3h and du interval is 1h - I already changed them in
our environment).
We can't observe du load easily, until we log in to the datanode and hit "top"
or whatever, or [the log level be in
debug|https://github.com/apache/ozone/blob/ozone-1.2.1/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/fs/AbstractSpaceUsageSource.java#L60].
The log level should be in INFO IMO.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]