[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933376#comment-16933376 ]
HuangTao edited comment on HDFS-14849 at 9/19/19 1:35 PM: ---------------------------------------------------------- I find a clue: the `chooseSourceDatanodes` get {quote}LIVE=2, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0, MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0, STALESTORAGE=0, REDUNDANT=22{quote} and all block index (0-8) exists, and three blocks 3/4/8 have no redundant block, and the datanode where block 8 stored is in DECOMMISSIONING, other two datanode adminState is null. {quote}[0, 1, 2, 3, 4, 5, 6, 7, 8, 6, 7, 6, 6, 5, 0, 1, 5, 0, 2, 5, 2, 5, 1, 2, 1, 5, 2, 7, 5, 2, 0]{quote} the `countNodes(block)` get {quote}LIVE=8, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0, MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0, STALESTORAGE=0, REDUNDANT=16{quote} so we need to replicate block 8, but there is no racks anymore. Now, I have a doubt why replicate some block more than once other than replicate the block 8 ? was (Author: marvelrock): I find a clue: the `chooseSourceDatanodes` get {quote}LIVE=2, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0, MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0, STALESTORAGE=0, REDUNDANT=22{quote} and all block index (0-8) exists, and three blocks 3/4/8 have no redundant block, and the datanode where block 8 stored is in DECOMMISSIONING, other two datanode adminState is null. the `countNodes(block)` get {quote}LIVE=8, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0, MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0, STALESTORAGE=0, REDUNDANT=16{quote} so we need to replicate block 8, but there is no racks anymore. Now, I have a doubt why replicate some block more than once other than replicate the block 8 ? > Erasure Coding: replicate block infinitely when datanode being decommissioning > ------------------------------------------------------------------------------ > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: HuangTao > Assignee: HuangTao > Priority: Major > Labels: EC, HDFS, NameNode > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC block in > that datanode will be replicated infinitely. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org