[ 
https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278778#comment-13278778
 ] 

Ashish Singhi commented on HDFS-3157:
-------------------------------------

Patch updated and ready for review.
Please provide your review comments.
{code}
+      /*
+       * Look up again to storedBlock as it might be a reported block also.
+       * @see BlockManager#checkReplicaCorrupt(...)
+       */
+      BlockInfo blkInfo = blocksMap.getStoredBlock(storedBlock);
{code}
I have added this because in updatedNeededReplication we want namenode to ask 
datanode to replicate the storedBlock which is there in its blockMap not with 
the reported block with datanode is reporting as corrupt.

Now in the test case I am asserting 3 things, 
First - There should be one block in the corruptReplicasMap.
Second - After marking the block as corrupt, ReplicationMonitor thread should 
replicate a live replica to one of the datanode.
Third - After replicating the live replica, the corrupt replica in 
corruptReplicasMap should get invalidated.
                
> Error in deleting block is keep on coming from DN even after the block report 
> and directory scanning has happened
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3157
>                 URL: https://issues.apache.org/jira/browse/HDFS-3157
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: J.Andreina
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0, 3.0.0
>
>         Attachments: HDFS-3157-1.patch, HDFS-3157.patch, HDFS-3157.patch, 
> HDFS-3157.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 
> 300,"dfs.datanode.directoryscan.interval" 1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which 
> replication happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other 
> datanode.
> Then at the NN side the following cmd is issued to DN from which the block is 
> deleted
> -------------------------------------------------------------------------------------
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX 
> because reported RBW replica with genstamp 1002 does not match COMPLETE 
> block's genstamp in block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> Removing block blk_2903555284838653156_1003 from neededReplications as it has 
> enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception 
> occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unexpected error trying to delete block blk_2903555284838653156_1003. 
> BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Error processing datanode Command
> java.io.IOException: Error in deleting blocks.
>       at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
>       at java.lang.Thread.run(Thread.java:619)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to