[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199672#comment-13199672 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Hdfs-trunk #945 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/945/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239880 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199683#comment-13199683 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Hdfs-0.23-Build #158 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/158/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239879 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199722#comment-13199722 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Mapreduce-0.23-Build #180 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/180/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239879 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199733#comment-13199733 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Mapreduce-trunk #978 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/978/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239880 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198624#comment-13198624 ] Konstantin Shvachko commented on HDFS-2877: --- As for the unit test. I thought it should just start one NN and then another in the same directory twice. And both starts should fail. The bug now is that the second starts successfully, right? Why is it hard? If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198645#comment-13198645 ] Uma Maheswara Rao G commented on HDFS-2877: --- deleteOnExit will actually delete on JVM exit right. But here we opearete all restarts in same JVM. I feel that should be the difficult here. Your proposed test will pass with out this fix as well, since that restarts are from same JVM and we will not exit. am i missing some thing? If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198832#comment-13198832 ] Suresh Srinivas commented on HDFS-2877: --- Is it is possible for lock to linger for some reason even though the NN process was killed? If so, can we add descriptive error message that describes how an admin can get around it after ensuring no namenode process is running? Isn't the patch as simple as: {noformat} try { res = file.getChannel().tryLock(); + lockF.deleteOnExit(); } catch(OverlappingFileLockException oe) { ... } catch(IOException e) { ... } {noformat} If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198835#comment-13198835 ] Suresh Srinivas commented on HDFS-2877: --- In my code in the prior comment, you still need the null check before deleteOnExit(). But checking if file exists etc. seems unnecessary. If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198838#comment-13198838 ] Suresh Srinivas commented on HDFS-2877: --- BTW is this not a duplicate of HDFS-2865? Should HDFS-2865 be closed as duplicate? If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198891#comment-13198891 ] Uma Maheswara Rao G commented on HDFS-2877: --- @Suresh, your proposed code looks good and simple. But that may not handle one cleanup case.i.e, we just created the lock file and try lock throws IOException due to some IO error. then we may not clean the file which we created here right. Here mainly we are talking about deleteOnExit to clean the file which we created right. am i missing some thing? If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198983#comment-13198983 ] Todd Lipcon commented on HDFS-2877: --- Uma's got it -- the RandomAccessFile constructor will create the file, but then, if we fail to lock that file we just created, we wouldn't clean it up. (eg if the underlying system is an NFS mount without NLM) If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199041#comment-13199041 ] Hari Mankude commented on HDFS-2877: Also, in non-HA situation, if namenode dies and comes back up, will it require admin intervention always? If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199054#comment-13199054 ] Todd Lipcon commented on HDFS-2877: --- No, because on a local disk, if the process crashes, the file lock will be dropped, and the new one can re-lock it. If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199232#comment-13199232 ] Aaron T. Myers commented on HDFS-2877: -- +1, the patch looks good to me. If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199290#comment-13199290 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Hdfs-trunk-Commit #1718 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1718/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239880 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199293#comment-13199293 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Common-trunk-Commit #1647 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1647/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239880 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199296#comment-13199296 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Hdfs-0.23-Commit #464 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/464/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239879 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199303#comment-13199303 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Common-0.23-Commit #474 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/474/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239879 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199337#comment-13199337 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Mapreduce-0.23-Commit #488 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/488/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239879 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199336#comment-13199336 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1662/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239880 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199406#comment-13199406 ] Hudson commented on HDFS-2877: -- Integrated in Hadoop-Hdfs-22-branch #125 (See [https://builds.apache.org/job/Hadoop-Hdfs-22-branch/125/]) HDFS-2877. If locking of a storage dir fails, it will remove the other NN's lock file on exit. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1239878 Files : * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/common/Storage.java If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.24.0, 0.23.1, 1.1.0, 0.22.1 Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198474#comment-13198474 ] Hadoop QA commented on HDFS-2877: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12512884/hdfs-2877.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1831//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1831//console This message is automatically generated. If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198589#comment-13198589 ] Uma Maheswara Rao G commented on HDFS-2877: --- Yes,I also mentioned this before in [https://issues.apache.org/jira/browse/HDFS-1690?focusedCommentId=13046348page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13046348] . Did not get the time to work in HDFS-1690 recently. Soon, i will upload patch for that as well.:-) I just verified this patch. It works fine as expected. Code changes looks good to me. +1 from my side. If locking of a storage dir fails, it will remove the other NN's lock file on exit -- Key: HDFS-2877 URL: https://issues.apache.org/jira/browse/HDFS-2877 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0, 1.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2877.txt In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira