[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844437#comment-17844437 ]
Adam Binford edited comment on HDFS-15634 at 5/7/24 8:47 PM: ------------------------------------------------------------- Chiming in on a 4 year old issue, we've hit similar issues where returning a node after decommissioning it cause the active namenode to failover because it holds the write lock too long processing the redundant blocks. In our case we have around ~1.2 million blocks on a single data node across three drives. I assume `DatanodeStorageInfo` represents a single drive? In which case the write lock is only released after processing all blocks on a single drive. was (Author: kimahriman): Chiming in on a 4 year old issue, we've hit similar issues where returning a node after decommissioning it cause the active namenode to failover because it holds the write lock too long processing the redundant blocks. In our case we have around ~120k blocks on a single data node across three drives. I assume `DatanodeStorageInfo` represents a single drive? In which case the write lock is only released after processing all blocks on a single drive. > Invalidate block on decommissioning DataNode after replication > -------------------------------------------------------------- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Reporter: Fengnan Li > Assignee: Fengnan Li > Priority: Major > Labels: pull-request-available > Attachments: write lock.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org