[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17094:
--
  Component/s: erasure-coding
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Fix bug in block recovery when there are stale datanodes
> 
>
> Key: HDFS-17094
> URL: https://issues.apache.org/jira/browse/HDFS-17094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When a block recovery occurs, `RecoveryTaskStriped` in datanode expects 
> `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one 
> correspondence. However, if there are locations in stale state when NameNode 
> handles heartbeat, this correspondence will be disrupted. In detail, there is 
> no stale location in `recoveryLocations`, but the block indices array is 
> still complete (i.e. contains the indices of all the locations). This will 
> cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong 
> internal block ID, and the corresponding datanode cannot find the replica, 
> thus making the recovery process fail. This bug needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes

2023-07-18 Thread Shuyan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyan Zhang updated HDFS-17094:

Description: When a block recovery occurs, `RecoveryTaskStriped` in 
datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be 
in one-to-one correspondence. However, if there are locations in stale state 
when NameNode handles heartbeat, this correspondence will be disrupted. In 
detail, there is no stale location in `recoveryLocations`, but the block 
indices array is still complete (i.e. contains the indices of all the 
locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` 
to generate a wrong internal block ID, and the corresponding datanode cannot 
find the replica, thus making the recovery process fail. This bug needs to be 
fixed.  (was: When a block recovery occurs, `RecoveryTaskStriped` in datanode 
expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in 
one-to-one correspondence. However, if there are locations in stale state when 
NameNode handles heartbeat, this correspondence will be disrupted. In detail, 
there is no stale location in `recoveryLocations`, but the block indices array 
is still complete (i.e. contains the indices of all the locations). This will 
cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong 
internal block ID, and the corresponding datanode cannot find the relica, thus 
making the recovery process fail. This bug needs to be fixed.)

> EC: Fix bug in block recovery when there are stale datanodes
> 
>
> Key: HDFS-17094
> URL: https://issues.apache.org/jira/browse/HDFS-17094
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When a block recovery occurs, `RecoveryTaskStriped` in datanode expects 
> `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one 
> correspondence. However, if there are locations in stale state when NameNode 
> handles heartbeat, this correspondence will be disrupted. In detail, there is 
> no stale location in `recoveryLocations`, but the block indices array is 
> still complete (i.e. contains the indices of all the locations). This will 
> cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong 
> internal block ID, and the corresponding datanode cannot find the replica, 
> thus making the recovery process fail. This bug needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes

2023-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17094:
--
Labels: pull-request-available  (was: )

> EC: Fix bug in block recovery when there are stale datanodes
> 
>
> Key: HDFS-17094
> URL: https://issues.apache.org/jira/browse/HDFS-17094
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When a block recovery occurs, `RecoveryTaskStriped` in datanode expects 
> `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one 
> correspondence. However, if there are locations in stale state when NameNode 
> handles heartbeat, this correspondence will be disrupted. In detail, there is 
> no stale location in `recoveryLocations`, but the block indices array is 
> still complete (i.e. contains the indices of all the locations). This will 
> cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong 
> internal block ID, and the corresponding datanode cannot find the relica, 
> thus making the recovery process fail. This bug needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org