[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082047#comment-13082047 ] Hudson commented on HDFS-1981: -- Integrated in Hadoop-Hdfs-22-branch #73 (See [https://builds.apache.org/job/Hadoop-Hdfs-22-branch/73/]) HDFS-1981. svn merge -c 1151666 from trunk to branch-0.22. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151668 Files : * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079839#comment-13079839 ] Hudson commented on HDFS-1981: -- Integrated in Hadoop-Hdfs-trunk #738 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/738/]) HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666 Files : * /hadoop/common/trunk/hdfs/CHANGES.txt * /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072105#comment-13072105 ] Hudson commented on HDFS-1981: -- Integrated in Hadoop-Hdfs-trunk-Commit #811 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/811/]) HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666 Files : * /hadoop/common/trunk/hdfs/CHANGES.txt * /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072099#comment-13072099 ] Konstantin Shvachko commented on HDFS-1981: --- + 1 on both patches. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071697#comment-13071697 ] Uma Maheswara Rao G commented on HDFS-1981: --- *Reason for above failures:* This patch is based on 0.22 branch. So, HDFS-1981_0.22.patch can not compile on trunk directly. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071692#comment-13071692 ] Hadoop QA commented on HDFS-1981: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487967/HDFS-1981_0.22.patch against trunk revision 1151344. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//console This message is automatically generated. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.22.patch, HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071521#comment-13071521 ] Uma Maheswara Rao G commented on HDFS-1981: --- Thanks Todd, I will update patch for 0.22 branch. I think we can merge the tests in HDFS-1073 / 0.23 as well for more coverage. What do you say? --thanks > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071513#comment-13071513 ] Todd Lipcon commented on HDFS-1981: --- Hi Uma. Since the merge of HDFS-1073 is imminent, and this bug is not present in HDFS-1073, I think it's best to target only 0.22 for this patch. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071511#comment-13071511 ] Uma Maheswara Rao G commented on HDFS-1981: --- Hi Konstantin, I have provided path on 0.23 version. Are you expecting patch on 0.22 version? I think , we have not release 0.22 officially right. That is why i provided patch directly on trunk. If you are expecting patch specifically on 0.22 branch, i can provide it. is it required on 0.22 as well? Current patch can be committed on trunk. --Thanks > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071279#comment-13071279 ] Hadoop QA commented on HDFS-1981: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487871/HDFS-1981_0.23.patch against trunk revision 1150960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//console This message is automatically generated. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Assignee: Uma Maheswara Rao G >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, > HDFS-1981_0.23.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070893#comment-13070893 ] Konstantin Shvachko commented on HDFS-1981: --- The patch looks good, except it is not compiling now. You should not remove the two imports from FSImage. In TestFSImage.testLoadFsEditsShouldReturnTrueWhenEditsNewExists() - getNameDirs() should not take parameters - FSImage does not have getStorage() method - Also member conf is not used anywhere in the test, can be removed If you could update the patch, I'll commit it. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062011#comment-13062011 ] ramkrishna.s.vasudevan commented on HDFS-1981: -- Hi Todd, Thanks for your comments.. I have reworked on some of the comments. I think you have reviewed the old patch and not the patch with the name HDFS-1981-1.patch Any way I have corrected some of the comments in the latest patch also * As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder Already Addressed in previous patch. * typo: NEW_EIDTS_STREAM have changed this to NEW_EDITS_STREAM * don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this Updated * "false == editsNew.exists()" ?? !editsNew.exists() Udpated * TODOs in the test case. don't swallow exceptions Updated * you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block Updated * no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway Updated * what's the purpose of the setup which creates bImg? It's not used in any of the test cases. Instead of using the variable bImg, have now created an instance local level * assertion text is wrong: "image should be deleted" – but it's checking that "edits.new" should be deleted. Fixed in the previous patch- as per the latest fix told by Konstantin > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061698#comment-13061698 ] Todd Lipcon commented on HDFS-1981: --- - As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder - typo: NEW_EIDTS_STREAM - don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this - "false == editsNew.exists()" ?? !editsNew.exists() - TODOs in the test case. don't swallow exceptions - you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block - no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway - what's the purpose of the setup which creates bImg? It's not used in any of the test cases. - assertion text is wrong: "image should be deleted" -- but it's checking that "edits.new" should be deleted. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056591#comment-13056591 ] Hadoop QA commented on HDFS-1981: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12484445/HDFS-1981-1.patch against trunk revision 1140030. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/860//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/860//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/860//console This message is automatically generated. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981-1.patch, HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054781#comment-13054781 ] Konstantin Shvachko commented on HDFS-1981: --- Not sure what introduced it, but The problem is that NN does not saveNamespace() when editsNew is present. This only happens in Ramakrishna's scenario, when editsNew is empty. That is when you start the checkpoint, and fail NN before modifying anything in the namespace. Deleting editsNew, is probably valid, but not consistent, since at this stage NN is in read-only mode. That is if something goes wrong we should leave the storage directory in exactly the same state as it was before the startup. I propose to increment numEdits if editsNew exists. This will trigger saving namespace after loading. So just one line change: {code} . if (editsNew.exists() && editsNew.length() > 0) { + numEdits ++; edits = new EditLogFileInputStream(editsNew); numEdits += loader.loadFSEdits(edits); edits.close(); } {code} Well, may be not one line as you need to increment even if {{editsNew.length() == 0}}. Your test should work in this case as well. Could you please convert it to JUnit4 and use {{MiniDFSCluster.Builder}} instead of a direct constructor. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049924#comment-13049924 ] Hadoop QA commented on HDFS-1981: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482669/HDFS-1981.patch against trunk revision 1135329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 31 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/786//console This message is automatically generated. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan > Fix For: 0.23.0 > > Attachments: HDFS-1981.patch > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038957#comment-13038957 ] ramkrishna.s.vasudevan commented on HDFS-1981: -- Writing UT for this may be difficult to reproduce the scenario. The steps that I followed to reproduce this issue are 1. Start namenode and backup namenode 2. Allow checkpointing to happen such that the edits.new file is created on the namenode. 3. At this point kill the NN and BNN. 4. Now start the NN and BNN. 5. When checkpointing starts again we will get the above exception. The exact problem comes in the loadFSEdits() api in FSImage.java Here if the loadFSEdits() api returns 0 then if (fsImage.recoverTransitionRead(dataDirs, editsDirs, startOpt)) { fsImage.saveNamespace(true); } saveNamespace() will not be invoked. Kindly correct me if you find any problems in this. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan > Fix For: 0.23.0 > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing
[ https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037949#comment-13037949 ] Todd Lipcon commented on HDFS-1981: --- Hi Ramkrishna. Can you provide a unit test which shows this issue? It would be especially good to see such a test against 0.22, since HDFS-1073 will restructure all this code when it's merged into 0.23. > When namenode goes down while checkpointing and if is started again > subsequent Checkpointing is always failing > -- > > Key: HDFS-1981 > URL: https://issues.apache.org/jira/browse/HDFS-1981 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 > Environment: Linux >Reporter: ramkrishna.s.vasudevan > Fix For: 0.23.0 > > > This scenario is applicable in NN and BNN case. > When the namenode goes down after creating the edits.new, on subsequent > restart the divertFileStreams will not happen to edits.new as the edits.new > file is already present and the size is zero. > so on trying to saveCheckPoint an exception occurs > 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: Namenode has an edit log with timestamp > of 2011-05-23 16:38:56 but new checkpoint was created using editlog with > timestamp 2011-05-23 16:37:30. Checkpoint Aborted. > This is a bug or is that the behaviour. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira