[ 
https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhaobo99 reassigned HDFS-17305:
------------------------------------

    Assignee: huangzhaobo99

> Add avoid datanode reason count related metrics to namenode.
> ------------------------------------------------------------
>
>                 Key: HDFS-17305
>                 URL: https://issues.apache.org/jira/browse/HDFS-17305
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: huangzhaobo99
>            Assignee: huangzhaobo99
>            Priority: Minor
>
> Now, there are slownode and load avoidance functions, mainly implemented in 
> theĀ  BlockPlacementPolicyDefault class.
> 1. After triggering the exclusion condition, some logs will be printed on nn, 
> which can be used to troubleshoot anomalies in nn by checking the logs, the 
> code is as follows:
> {code:java}
> ...
> if (!node.isInService()) {
>   logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
>   return false;
> }
> if (avoidStaleNodes) {
>   if (node.isStale(this.staleInterval)) {
>     logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
>     return false;
>   }
> }
> ...{code}
> 2. If the exclusion condition is triggered, can we record it through metrics 
> and count the total number of exclusions?
> 3. These metrics through prometheus+grafana to observe the current situation 
> of the cluster when selecting datanodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to