[ 
https://issues.apache.org/jira/browse/HDDS-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Yarovoy reassigned HDDS-15324:
-------------------------------------

    Assignee: Andrey Yarovoy

> [Ozone Dashboard] Create a dashboard that shows DataNode performance metrics
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-15324
>                 URL: https://issues.apache.org/jira/browse/HDDS-15324
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Andrey Yarovoy
>            Assignee: Andrey Yarovoy
>            Priority: Major
>
> Create a dashboard that shows DataNode performance metrics:
> h3. JVM (HddsDatanode)
>  * CPU — JVM vs system load for the DN process hosts you pick.
>  * Heap — used, committed, and max heap memory.
>  * Garbage collection — how much CPU time GC uses and how often collections 
> happen.
>  * Netty — direct (off-heap) buffer use vs configured max.
>  * Threads — count of JVM threads by state.
> h3. Ratis
>  * Log append throughput, flushes, and RPC-style client read/write rates.
>  * Backlog (pending queue) and rough timing snapshots for appends, follower 
> appends, and log sync; failed writes rate.
>  * All of this is rolled up across raft groups per DataNode (one scrape 
> target series per selected node).
> h3. Container I/O
> For common Xceiver operations (WriteChunk, ReadChunk, PutBlock, GetBlock, 
> DeleteChunk/Block, CreateContainer, CloseContainer):
>  * How many ops per second, bytes per second, and average latency 
> (CloseContainer omits bytes; only ops + latency).
> h3. Storage volume I/O
> Per selected DataNode, sums across disks: read/write throughput, read/write 
> IOPS, read/write latency, and volume space used vs capacity (excluding 
> total-capacity rollup metrics).
> h3. SCM commands and background work
>  * Command handlers — for each SCM command type, panels for incoming command 
> rate, handler invocation rate, run time, queue depth, and optional 
> thread-pool size — so you can see SCM-driven work separated by command.
>  * Block deleting service — background delete pipeline: transactions, 
> blocks/bytes succeeded or failed, pending/chosen/marked counts, retries, 
> outliers (e.g. lock timeouts, out-of-order transactions).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to