[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042780#comment-17042780 ] Fei Hui edited comment on HDFS-15186 at 2/23/20 2:52 AM: - [~yaoguangdong]Thanks for reporting this. Good Catch. Sorry for late, I couldn't receive emails these days. +1 for [~ayushtkn] suggestions. I thinks indice[6] is not in liveindcies and busyindices, this cause this problem. Maybe we should fix it in namenode side. was (Author: ferhui): [~yaoguangdong]Thanks for reporting this !Good Catch! Sorry for late, I couldn't receive emails these days! +1 for [~ayushtkn] suggestions. I thinks indice[6] is not in liveindcies and busyindices, this cause this problem. Maybe we should fix it in namenode side. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043286#comment-17043286 ] Yao Guangdong edited comment on HDFS-15186 at 2/24/20 10:06 AM: [~ayushtkn], OK. Thanks for your advice. [~gjhkael] had been fixed it in namenode side by HDFS-14768 . I think this is duplicated. You can close it. was (Author: yaoguangdong): [~ayushtkn], OK. Thanks for your reply. [~gjhkael] had been fixed it in namenode side by HDFS-14768 . I think this is duplicated. You can close it. > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043133#comment-17043133 ] Yao Guangdong edited comment on HDFS-15186 at 2/25/20 3:08 AM: --- [~ferhui], [~ayushtkn], [~gjhkael] Thanks for yours patient review. I agree with yours point that fix it in namenode side. I have a suspicion. We can copy blocks from others DN in 3 replica mode when we decommission DN and the decommissioning DN is busy. But, we only can copy blocks from the decommissioning DN in ec mode if we don't reconstruct it. The time we cost in decommission is 69 hours (1m / 4 / 3600 = 69 hours) if we have 1 million blocks in one DN and the cost time we copy a block is one second and the hard limit is default 4. Which will make the speed of decommission very slow if we copy all blocks from decommission DN and we add decommissioning busy replica into live replica check? Is my comprehend right? was (Author: yaoguangdong): [~ferhui], [~ayushtkn], [~gjhkael] Thanks for yours patient review. I agree with yours point that fix it in namenode side. I have a suspicion. We can copy blocks from others DN in 3 replica mode when we decommission DN and the decommissioning DN is busy. But, we only can copy blocks from the decommissioning DN in ec mode if we don't reconstruct it. The time we cost in decommission is 69 hours (100W / 4 / 3600 = 69 hours) if we have 100W blocks in one DN and the cost time we copy a block is one second and the hard limit is default 4. Which will make the speed of decommission very slow if we copy all blocks from decommission DN and we add decommissioning busy replica into live replica check? Is my comprehend right? > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch > > > I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case
[ https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846211#comment-17846211 ] Chenyu Zheng edited comment on HDFS-15186 at 5/24/24 8:41 AM: -- Hi, all. I reproduce the problem of ec algorithm which is described in HADOOP-19180 . Would you mind taking a look at HADOOP-19180 ? was (Author: zhengchenyu): Hi, all. I reproduce the problem of ec algorithm which is described in HDFS-17521. Would you mind taking a look at HDFS-17521? > Erasure Coding: Decommission may generate the parity block's content with all > 0 in some case > > > Key: HDFS-15186 > URL: https://issues.apache.org/jira/browse/HDFS-15186 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Yao Guangdong >Assignee: Yao Guangdong >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, > HDFS-15186.003.patch, HDFS-15186.004.patch, HDFS-15186.005.patch > > > # I can find some parity block's content with all 0 when i decommission some > DataNode(more than 1) from a cluster. And the probability is very big(parts > per thousand).This is a big problem.You can think that if we read data from > the zero parity block or use the zero parity block to recover a block which > can make us use the error data even we don't know it. > There is some case in the below: > B: Busy DataNode, > D:Decommissioning DataNode, > Others is normal. > 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)]. > > In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), > 7, 8(D)], the DN may received reconstruct block command and the > liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which in > the class StripedReconstructionInfo) length is 2. > The targets's length is 2 which mean that the DataNode need recover 2 > internal block in current code.But from the liveIndices we only can find 1 > missing block, so the method StripedWriter#initTargetIndices will use 0 as > the default recover block and don't care the indices 0 is in the sources > indices or not. > When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] > use the ec algorithm.We can find that the indices [0] is in the both the > sources indices and the targets indices in this case. The returned target > buffer in the indices [6] is always 0 from the ec algorithm.So I think this > is the ec algorithm's problem. Because it should more fault tolerance.I try > to fixed it .But it is too hard. Because the case is too more. The second is > another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to > recover indices [0, 6, 0]). So I changed my mind.Invoke the ec algorithm > with a correct parameters. Which mean that remove the duplicate target > indices 0 in this case.Finally, I fixed it in this way. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org