liuguanghua created HDFS-17048:
----------------------------------

             Summary: FSNamesystem.delete() maybe cause data residue when 
active namenode crash  or shutdown 
                 Key: HDFS-17048
                 URL: https://issues.apache.org/jira/browse/HDFS-17048
             Project: Hadoop HDFS
          Issue Type: Bug
         Environment:  

 
            Reporter: liuguanghua


Consider the following scenario:

(1) User delete a hdfs dir with many blocks.

(2) Then ative Namenode is crash or shutdown or failover to standby Namenode  
by administrator

(3) This may result in residual data

 

FSNamesystem.delete() will

(1)delete dir first

(2)add toRemovedBlocks into markedDeleteQueue. 

(3) MarkedDeleteBlockScrubber Thread will consumer the markedDeleteQueue and 
delete blocks.

If the active namenode crash, the blocks in markedDeleteQueue will be lost and 
never be deleted. And the block cloud not find via hdfs fsck command. But it is 
alive in datanode disk.

 

Thus , 

SummaryA =  hdfs dfs -du -s / 

SummaryB =sum( datanode report dfsused)

SummaryA < SummaryB

 

This may be unavoidable.  But is there any tools to find out the blocks that 
should be delted and clean it ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to