[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

Wei-Chiu Chuang (Jira) Wed, 09 Dec 2020 16:55:38 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246935#comment-17246935
 ]


Wei-Chiu Chuang commented on HDFS-15170:
----------------------------------------

 

  
{code:java}
+        // If the block is an EC block, the whole block group is marked
+        // corrupted, so if this block is getting deleted, remove the block
+        // group from corrupt replica map explicitly, since removal of the
+        // block from corrupt replicas may be delayed if the blocks are on
+        // stale storage due to failover or any other reason.
+        corruptReplicas.removeFromCorruptReplicasMap(b.getStored(), node);
^^^^^^^^^^^^^
{code}
this line is not needed if 
dfs.namenode.corrupt.block.delete.immediately.enabled is true? it will be 
removed later by invalidateBlock().
 Or do we want to call {{corruptReplicas.removeFromCorruptReplicasMap(corrupt, 
node)}} instead? b.getStored() is the internal block whereas corrupt is the EC 
block group id. The code doesn't seem to match the comment.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --------------------------------------------------------------------------
>
>                 Key: HDFS-15170
>                 URL: https://issues.apache.org/jira/browse/HDFS-15170
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>         Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

Reply via email to