[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhao Yi Ming updated HDFS-14699: -------------------------------- Summary: Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold (was: Erasure Coding: Can NOT trigger the reconstruction when have the dup internal blocks and missing one internal block) > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --------------------------------------------------------------------------------------------------------------- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec > Affects Versions: 3.2.0, 3.1.1, 3.3.0 > Reporter: Zhao Yi Ming > Assignee: Zhao Yi Ming > Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org