[jira] [Comment Edited] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-01-08 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317352#comment-16317352
 ] 

Manoj Govindassamy edited comment on HDFS-12985 at 1/9/18 12:31 AM:


Thanks for the review [~yzhangal]. Committed it to trunk and branch-2. 


was (Author: manojg):
Thanks for the review [~yzhangal]. Committed it to trunk. 

> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 3.1.0, 2.10.0
>
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-01-04 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311928#comment-16311928
 ] 

Manoj Govindassamy edited comment on HDFS-12985 at 1/4/18 7:46 PM:
---

Attached v01 to address the following:
1. {{INodeFile#cleanSubtree()}} updates {{ReclaimContext#removedUCFiles}} after 
deleting the snapshot file.
2. {{FSDirDeleteOp#deleteInternal}} already take care of removing the leases 
for removedUCFiles and removedINodes.
3. New unit test {{TestOpenFilesWithSnapshot#testOpenFileDeletionAndNNRestart}} 
added to show the problem and the fix solving the same.
[~yzhangal], [~eddyxu], can you please take a look at the patch?


was (Author: manojg):
Attached v01 to address the following:
1. {{INodeFile#cleanSubtree()}} updates {{ReclaimContext#removedUCFiles}} after 
deleting the snapshot file.
2. {{FSDirDeleteOp#deleteInternal}} already take care of removing the leases 
for removedUCFiles and removedINodes.
3. New unit test {{TestOpenFilesWithSnapshot#testOpenFileDeletionAndNNRestart}} 
added to show the problem and the fix solving the same.

> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org