[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
[ https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17094: -- Component/s: erasure-coding Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > EC: Fix bug in block recovery when there are stale datanodes > > > Key: HDFS-17094 > URL: https://issues.apache.org/jira/browse/HDFS-17094 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.4.0 >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When a block recovery occurs, `RecoveryTaskStriped` in datanode expects > `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one > correspondence. However, if there are locations in stale state when NameNode > handles heartbeat, this correspondence will be disrupted. In detail, there is > no stale location in `recoveryLocations`, but the block indices array is > still complete (i.e. contains the indices of all the locations). This will > cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong > internal block ID, and the corresponding datanode cannot find the replica, > thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
[ https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17094: Description: When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the replica, thus making the recovery process fail. This bug needs to be fixed. (was: When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the relica, thus making the recovery process fail. This bug needs to be fixed.) > EC: Fix bug in block recovery when there are stale datanodes > > > Key: HDFS-17094 > URL: https://issues.apache.org/jira/browse/HDFS-17094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When a block recovery occurs, `RecoveryTaskStriped` in datanode expects > `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one > correspondence. However, if there are locations in stale state when NameNode > handles heartbeat, this correspondence will be disrupted. In detail, there is > no stale location in `recoveryLocations`, but the block indices array is > still complete (i.e. contains the indices of all the locations). This will > cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong > internal block ID, and the corresponding datanode cannot find the replica, > thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
[ https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17094: -- Labels: pull-request-available (was: ) > EC: Fix bug in block recovery when there are stale datanodes > > > Key: HDFS-17094 > URL: https://issues.apache.org/jira/browse/HDFS-17094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When a block recovery occurs, `RecoveryTaskStriped` in datanode expects > `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one > correspondence. However, if there are locations in stale state when NameNode > handles heartbeat, this correspondence will be disrupted. In detail, there is > no stale location in `recoveryLocations`, but the block indices array is > still complete (i.e. contains the indices of all the locations). This will > cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong > internal block ID, and the corresponding datanode cannot find the relica, > thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org