[ 
https://issues.apache.org/jira/browse/HDFS-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721156#comment-17721156
 ] 

farmmamba commented on HDFS-17003:
----------------------------------

[~hexiaoqiao] , Hi, sir.  the data loss can be reproduce like below, and main 
reason is in the end. 

suppose we have d1-d6, r1-r3 of a file test.txt.

1、echo 0 > d1  and echo 0 > d2

2、hdfs dfs -cat test.txt to report bad d1、d2 and reconstruction the d1 to d1', 
d2 to d2',  here will only invalidate d2 beacause the namenode logic, so we 
still have corrupt d1.

3、echo 0 > d1' and echo 0 > d2', then  execute hdfs dfs -cat test.txt  to 
reconstruction d1' to d1'' , d2' to d2''

4、then echo 0 > r1;  echo 0 > r2; echo 0 > r3.

5、wait a moment,  the file is corrupted and can not be recoverable.

 

The main reason of this case is  that d1 and d1‘ is not deleted in time, and 
namenode detects the excess blocks and then deletes the right block d1''.

 

any other information, we can see code in BlockManager#addStoredBlock method:
{code:java}
if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) {
  invalidateCorruptReplicas(storedBlock, reportedBlock, num);
}{code}
 

if we destory two data blocks of a EC stripe,  then hdfs will reconstruct those 
two data blocks and send IBR to namenode. So, it will execute 
BlockManager#addStoredBlock method,  when receiving the second data block's 
IBR,  namenode will enter the if condition above. the param we passed here is 
reportedBlock. In invalidateCorruptReplicas method,  it will add corrupt blocks 
to InvalidateBlocks according to the reportedBlock param, so this logic will 
ignore invalidating the block who send IBR firstly.

 

> Erasure coding: invalidate wrong block after reporting bad blocks from 
> datanode
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-17003
>                 URL: https://issues.apache.org/jira/browse/HDFS-17003
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: farmmamba
>            Priority: Critical
>
> After receiving reportBadBlocks RPC from datanode, NameNode compute wrong 
> block to invalidate. It is a dangerous behaviour and may cause data loss. 
> Some logs in our production as below:
>  
> NameNode log:
> {code:java}
> 2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on datanode: 
> datanode1:50010
> 2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404319_1471186 on datanode: 
> datanode2:50010{code}
> datanode1 log:
> {code:java}
> 2023-05-08 21:23:49,088 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on 
> /data7/hadoop/hdfs/datanode
> 2023-05-08 21:24:00,509 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed 
> to delete replica blk_-9223372036848404319_1471186: ReplicaInfo not 
> found.{code}
>  
> This phenomenon can be reproduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to