[ https://issues.apache.org/jira/browse/HDFS-16566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruinan Gu updated HDFS-16566: ----------------------------- Description: Simple case: RS3-2 ,[0(busy),2,3,4] (1 missing),0 is busy , DN0.getNumberOfBlocksToBeReplicated() >= maxReplicationStreams and the priority of the recovery is not QUEUE_HIGHEST_PRIORITY. We can get liveblockIndice=[2,3,4], additionalRepl=1.So the DN will get the LiveBitSet=[2,3,4] and targets.length=1. According to StripedWriter.initTargetIndices(), 0 will get recovered instead of 1. So the internal blocks will become [0(busy),2,3,4,0'(excess)].Although NN will detect, delete the excess replicas and recover the missing block(1) correctly after the wrong recovery of 0', I don't think this process is expected and the recovery of 0' is obviously wrong and not necessary. was: Simple case: RS3-2 ,[0(busy),2,3,4] (1 missing), liveblockIndice=[2,3,4], additionalRepl=1.So the DN will get the LiveBitSet=[2,3,4] and targets.length=1. According to StripedWriter.initTargetIndices(), 0 will get recovered instead of 1. So the internal blocks will become [0(busy),2,3,4,0'(excess)].Although NN will detect, delete the excess replicas and recover the missing block(1) correctly after the wrong recovery of 0', I don't think this process is expected and the recovery of 0' is obviously wrong and not necessary. > Erasure Coding: Recovery may causes excess replicas when busy DN exsits > ----------------------------------------------------------------------- > > Key: HDFS-16566 > URL: https://issues.apache.org/jira/browse/HDFS-16566 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.3.2 > Reporter: Ruinan Gu > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Simple case: > RS3-2 ,[0(busy),2,3,4] (1 missing),0 is busy , > DN0.getNumberOfBlocksToBeReplicated() >= maxReplicationStreams and the > priority of the recovery is not QUEUE_HIGHEST_PRIORITY. > We can get liveblockIndice=[2,3,4], additionalRepl=1.So the DN will get the > LiveBitSet=[2,3,4] and targets.length=1. > According to StripedWriter.initTargetIndices(), 0 will get recovered instead > of 1. So the internal blocks will become [0(busy),2,3,4,0'(excess)].Although > NN will detect, delete the excess replicas and recover the missing block(1) > correctly after the wrong recovery of 0', I don't think this process is > expected and the recovery of 0' is obviously wrong and not necessary. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org