[ https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877308#comment-16877308 ]
Íñigo Goiri commented on HDFS-14624: ------------------------------------ We have similar issues internally. Feel free to propose more logs to add. Otherwise we can go with [^HDFS-14624.001.patch]. > When decommissioning a node, log remaining blocks to replicate periodically > --------------------------------------------------------------------------- > > Key: HDFS-14624 > URL: https://issues.apache.org/jira/browse/HDFS-14624 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.3.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Attachments: HDFS-14624.001.patch > > > When a node is marked for decommission, there is a monitor thread which runs > every 30 seconds by default, and checks if the node still has pending blocks > to be replicated before the node can complete replication. > There are two existing debug level messages logged in the monitor thread, > DatanodeAdminManager$Monitor.check(), which log the correct information > already, first as the pending blocks are replicated: > {code:java} > LOG.debug("Node {} still has {} blocks to replicate " > + "before it is a candidate to finish {}.", > dn, blocks.size(), dn.getAdminState());{code} > And then after the initial set of blocks has completed and a rescan happens: > {code:java} > LOG.debug("Node {} {} healthy." > + " It needs to replicate {} more blocks." > + " {} is still in progress.", dn, > isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code} > I would like to propose moving these messages to INFO level so it is easier > to monitor decommission progress over time from the Namenode log. > Based on the default settings, this would result in at most 1 log message per > node being decommissioned every 30 seconds. The reason this is at the most, > is because the monitor thread stops after checking after 500K blocks and > therefore in practice it could be as little as 1 log message per 30 seconds, > even if many DNs are being decommissioned at the same time. > Note that the namenode webUI does display the above information, but having > this in the NN logs would allow progress to be tracked more easily. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org