[jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened

Uma Maheswara Rao G (JIRA) Sun, 01 Jul 2012 12:08:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404788#comment-13404788
 ]


Uma Maheswara Rao G commented on HDFS-3157:
-------------------------------------------

Hi Nicholas,

Latest Patch looks great. I have one comment:

{code}
 (corrupted == stored?
{code}
This should be .equals? as we creating new reference of BlockInfo explicitly in 
some of the ctors right?

And other question is:
if (countNodes(b.stored).liveReplicas() >= bc.getReplication()) {
This point may not be related to this patch, but considering one case I wanted 
to point it.
Due to several pipeline failure in cluster, only 2 live replicas present in the 
cluster and all other nodes has the partial block(corrupt) present in RBW.
Now NN can not invalidat that blocks as it did not meet the enough replication 
and may try to replicate them to other nodes first. But unfortunately other 
nodes already have the block with older genstamp. volumes map may have that 
blocks already and I remember it will reject the replication. So, we have only 
2 live replicas even though we have more DNs. But this situation should be very 
rare and almost no possibility in bigger clusters. Worth considering the case 
for small clusters. Brahma reported this in one small cluster of 5 nodes. 
Anyway I will ask him to file separate one, we can discuss there.


Also Thanks a lot Ashish for your efforts on this issue :-)

Thanks
Uma
                
> Error in deleting block is keep on coming from DN even after the block report 
> and directory scanning has happened
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3157
>                 URL: https://issues.apache.org/jira/browse/HDFS-3157
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: J.Andreina
>            Assignee: Ashish Singhi
>         Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch, 
> HDFS-3157-3.patch, HDFS-3157-3.patch, HDFS-3157-4.patch, HDFS-3157-5.patch, 
> HDFS-3157.patch, HDFS-3157.patch, HDFS-3157.patch, h3157_20120618.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 
> 300,"dfs.datanode.directoryscan.interval" 1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which 
> replication happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other 
> datanode.
> Then at the NN side the following cmd is issued to DN from which the block is 
> deleted
> -------------------------------------------------------------------------------------
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX 
> because reported RBW replica with genstamp 1002 does not match COMPLETE 
> block's genstamp in block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> Removing block blk_2903555284838653156_1003 from neededReplications as it has 
> enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception 
> occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unexpected error trying to delete block blk_2903555284838653156_1003. 
> BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Error processing datanode Command
> java.io.IOException: Error in deleting blocks.
>       at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
>       at java.lang.Thread.run(Thread.java:619)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened

Reply via email to