[
https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198832#comment-13198832
]
Suresh Srinivas commented on HDFS-2877:
---------------------------------------
Is it is possible for lock to linger for some reason even though the NN process
was killed? If so, can we add descriptive error message that describes how an
admin can get around it after ensuring no namenode process is running?
Isn't the patch as simple as:
{noformat}
try {
res = file.getChannel().tryLock();
+ lockF.deleteOnExit();
} catch(OverlappingFileLockException oe) {
...
} catch(IOException e) {
...
}
{noformat}
> If locking of a storage dir fails, it will remove the other NN's lock file on
> exit
> ----------------------------------------------------------------------------------
>
> Key: HDFS-2877
> URL: https://issues.apache.org/jira/browse/HDFS-2877
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.0, 0.24.0, 1.0.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-2877.txt
>
>
> In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of
> whether we successfully lock the directory. So, if another NN has the
> directory locked, then we'll fail to lock it the first time we start another
> NN. But our failed start attempt will still remove the other NN's lockfile,
> and a second attempt will erroneously start.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira