[ 
https://issues.apache.org/jira/browse/HDFS-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022293#comment-18022293
 ] 

ASF GitHub Bot commented on HDFS-17568:
---------------------------------------

github-actions[bot] commented on PR #6921:
URL: https://github.com/apache/hadoop/pull/6921#issuecomment-3325980578

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> [Decommission]Show Aggregated Reason for Why Low-Redundancy Block is Skipped 
> for Reconstruction
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17568
>                 URL: https://issues.apache.org/jira/browse/HDFS-17568
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: wuchang
>            Priority: Major
>              Labels: pull-request-available
>
> The troubleshooting for decommission is very hard even in DEBUG mode.
> In somehow cases when decommission has taken a lot of time but we are not 
> sure whether it is still in-progress, we run {{-refreshNodes}} to try to 
> re-trigger the decommission(In fact it is useless).
> Then we check the NameNode logs, but it is a pity that we cannot find any 
> useful log about whether or not our {{refreshNodes}} subcommand has taken any 
> effect or not.
> So, my change is:
>  * I changed this critical log 's level from TRACE to INFO since this log is 
> not a repeatedly log and it gives administrator critical information for 
> about what happened.
> {code:java}
>     } else {
>       LOG.info("startDecommission: Node {} in {}, nothing to do.",
>           node, node.getAdminState());
>     }{code}
>  * When the Reconstruction which is trigged by node decommission is skipped, 
> we want to know the reason. The reason is divided to 3 catagories
>  ## No source node is available
>  ## No Target node is available
>  ## ReconstructionWork is built but validation failed
>       I put these reasons in a single enum {{{}ReconstructionSkipReason{}}}. 
> In DEBUG mode, the reason will be aggregated and showed to users.
> The log aggregated by {{ReconstructionSkipReason}} is as below:
> {code:java}
> 2024-07-10 02:59:09,707 [Thread-990] DEBUG blockmanagement.BlockManager: 
> Block blk_3_0 is not scheduled for reconstruction since: [ source node or 
> storage unavailable on node [DISK]storageID_0_3:NORMAL:127.0.0.1:9866. Detail 
> : [stored replica state is corrupt or excess] source node or storage 
> unavailable on node [DISK]storageID_1_3:NORMAL:127.0.0.1:9866. Detail : 
> [replica is already decommissioned] ]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to