[ https://issues.apache.org/jira/browse/HDFS-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wudeyu updated HDFS-16100: -------------------------- Attachment: HDFS-16100.001.patch > HA: Improve performance of Standby node transition to Active > ------------------------------------------------------------- > > Key: HDFS-16100 > URL: https://issues.apache.org/jira/browse/HDFS-16100 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.3.1 > Reporter: wudeyu > Assignee: wudeyu > Priority: Major > Attachments: HDFS-16100.001.patch, HDFS-16100.patch > > > pendingDNMessages in Standby is used to support process postponed block > reports. Block reports in pendingDNMessages would be processed: > # If GS of replica is in the future, Standby Node will process it when > corresponding edit log(e.g add_block) is loaded. > # If replica is corrupted, Standby Node will process it while it transfer to > Active. > # If DataNode is removed, corresponding of block reports will be removed in > pendingDNMessages. > Obviously, if num of corrupted replica grows, more time cost during > transferring. In out situation, there're 60 millions block reports in > pendingDNMessages before transfer. Processing block reports cost almost 7mins > and it's killed by zkfc. The replica state of the most block reports is RBW > with wrong GS(less than storedblock in Standby Node). > In my opinion, Standby Node could ignore the block reports that replica state > is RBW with wrong GS. Because Active node/DataNode will remove it later. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org