[jira] [Commented] (HDFS-4482) ReplicationMonitor thread can exit with NPE due to the race between delete and replication of same file.

Uma Maheswara Rao G (JIRA) Tue, 12 Feb 2013 02:35:16 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576530#comment-13576530
 ]


Uma Maheswara Rao G commented on HDFS-4482:
-------------------------------------------

@Kihwal, Agreed. Current implementation of default placement policy not using 
src path any where. I am not sure about the motivation in that API design, I 
think one point may be, that users may want to use that src path in their 
custom placement policy. Since that is exposed placemnet policy, user can write 
their own policy and plugin. In their implementation they might use that path 
for some purpose. So, I don't think, we can change that anyway now.
                
> ReplicationMonitor thread can exit with NPE due to the race between delete 
> and replication of same file.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4482
>                 URL: https://issues.apache.org/jira/browse/HDFS-4482
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.1-alpha
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Blocker
>         Attachments: HDFS-4482.patch
>
>
> Trace:
> {noformat}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1442)
>       at 
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:269)
>       at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:163)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:131)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1157)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1063)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3085)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3047)
>       at java.lang.Thread.run(Thread.java:619)
> {noformat}
> What I am seeing here is:
> 1) create a file and write with 2 DNS
> 2) Close the file.
> 3) Kill one DN
> 4) Let replication start.
>   Info:
>     {code}
>  // choose replication targets: NOT HOLDING THE GLOBAL LOCK
>       // It is costly to extract the filename for which chooseTargets is 
> called,
>       // so for now we pass in the block collection itself.
>       rw.targets = blockplacement.chooseTarget(rw.bc,
>           rw.additionalReplRequired, rw.srcNode, rw.liveReplicaNodes,
>           excludedNodes, rw.block.getNumBytes());{code}
> Here we are choosing target outside the global lock. Inside we will try to 
> get the src path from blockCollection(nothing but INodeFile here).
> see the code for FSDirectory#getFullPathName
>  Here it is incrementing the depth until it has parent. and Later it will 
> iterate and access parent again in next loop.
> 5) before going to secnd loop in FSDirectory#getFullPathName, if file is 
> deleted by client then that parent would have been set as null. So, here 
> accessing the parent can cause NPE because it is not under lock.
> [~brahmareddy] reported this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4482) ReplicationMonitor thread can exit with NPE due to the race between delete and replication of same file.

Reply via email to