[ https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee reassigned HDFS-11960: --------------------------------- Assignee: Kihwal Lee > Successfully closed file can stay under-replicated. > --------------------------------------------------- > > Key: HDFS-11960 > URL: https://issues.apache.org/jira/browse/HDFS-11960 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > > If a certain set of conditions hold at the time of a file creation, a block > of the file can stay under-replicated. This is because the block is > mistakenly taken out of the under-replicated block queue and never gets > reevaluated. > Re-evaluation can be triggered if > - a replica containing node dies. > - setrep is called > - NN repl queues are reinitialized (NN failover or restart) > If none of these happens, the block stays under-replicated. > Here is how it happens. > 1) A replica is finalized, but the ACK does not reach the upstream in time. > IBR is also delayed. > 2) A close recovery happens, which updates the gen stamp of "healthy" > replicas. > 3) The file is closed with the healthy replicas. It is added to the > replication queue. > 4) A replication is scheduled, so it is added to the pending replication > list. The replication target is picked as the failed node in 1). > 5) The old IBR is finally received for the failed/excluded node. In the > meantime, the replication fails, because there is already a finalized replica > (with older gen stamp) on the node. > 6) The IBR processing removes the block from the pending list, adds it to > corrupt replicas list, and then issues invalidation. Since the block is in > neither replication queue nor pending list, it stays under-replicated. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org