[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082047#comment-13082047
 ] 

Hudson commented on HDFS-1981:
--

Integrated in Hadoop-Hdfs-22-branch #73 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/73/])
HDFS-1981. svn merge -c 1151666 from trunk to branch-0.22.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151668
Files : 
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java


> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079839#comment-13079839
 ] 

Hudson commented on HDFS-1981:
--

Integrated in Hadoop-Hdfs-trunk #738 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/738/])
HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. 
Contributed by Uma Maheswara Rao G.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666
Files : 
* /hadoop/common/trunk/hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
/hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072105#comment-13072105
 ] 

Hudson commented on HDFS-1981:
--

Integrated in Hadoop-Hdfs-trunk-Commit #811 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/811/])
HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. 
Contributed by Uma Maheswara Rao G.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666
Files : 
* /hadoop/common/trunk/hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
/hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-27 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072099#comment-13072099
 ] 

Konstantin Shvachko commented on HDFS-1981:
---

+ 1 on both patches.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-27 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071697#comment-13071697
 ] 

Uma Maheswara Rao G commented on HDFS-1981:
---

*Reason for above failures:*
This patch is based on 0.22 branch. So, HDFS-1981_0.22.patch can not compile on 
trunk directly. 

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071692#comment-13071692
 ] 

Hadoop QA commented on HDFS-1981:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12487967/HDFS-1981_0.22.patch
  against trunk revision 1151344.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1038//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1038//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//console

This message is automatically generated.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.22.patch, HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-26 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071521#comment-13071521
 ] 

Uma Maheswara Rao G commented on HDFS-1981:
---

Thanks Todd,
 I will update patch for 0.22 branch.
I think we can merge the tests in HDFS-1073 / 0.23 as well for more coverage. 
What do you say?

--thanks

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-26 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071513#comment-13071513
 ] 

Todd Lipcon commented on HDFS-1981:
---

Hi Uma. Since the merge of HDFS-1073 is imminent, and this bug is not present 
in HDFS-1073, I think it's best to target only 0.22 for this patch.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-26 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071511#comment-13071511
 ] 

Uma Maheswara Rao G commented on HDFS-1981:
---

Hi Konstantin,
 I have provided path on 0.23 version.
Are you expecting patch on 0.22 version? I think , we have not release 0.22 
officially right. That is why i provided patch directly on trunk.
If you are expecting patch specifically on 0.22 branch, i can provide it.
is it required on 0.22 as well? 
Current patch can be committed on trunk.

--Thanks

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071279#comment-13071279
 ] 

Hadoop QA commented on HDFS-1981:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12487871/HDFS-1981_0.23.patch
  against trunk revision 1150960.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1024//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//console

This message is automatically generated.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch, 
> HDFS-1981_0.23.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070893#comment-13070893
 ] 

Konstantin Shvachko commented on HDFS-1981:
---

The patch looks good, except it is not compiling now.
You should not remove the two imports from FSImage.

In TestFSImage.testLoadFsEditsShouldReturnTrueWhenEditsNewExists()
- getNameDirs() should not take parameters
- FSImage does not have getStorage() method
- Also member conf is not used anywhere in the test, can be removed

If you could update the patch, I'll commit it.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062011#comment-13062011
 ] 

ramkrishna.s.vasudevan commented on HDFS-1981:
--

Hi Todd,

Thanks for your comments..
I have reworked on some of the comments.

I think you have reviewed the old patch and not the patch with the name
HDFS-1981-1.patch

Any way I have corrected some of the comments in the latest patch also

*  As Konstantin said, please use Junit 4 (annotations API) instead of 
Junit 3, and use the MiniDFSCluster builder
Already Addressed in previous patch.  
* typo: NEW_EIDTS_STREAM
have changed this to NEW_EDITS_STREAM
* don't use the string constant "dfs.name.dir" - there are constants in 
DFSConfigKeys for this
Updated 
* "false == editsNew.exists()" ?? !editsNew.exists()
Udpated
* TODOs in the test case. don't swallow exceptions
Updated
* you can use IOUtils.cleanup or IOUtils.closeStream in the finally block 
inside of the block
Updated
* no need to clear editsStreams in teardown method - it's an instance var 
so it will be recreated for each case anyway
Updated
* what's the purpose of the setup which creates bImg? It's not used in any 
of the test cases.
   Instead of using the variable bImg, have now created an instance local 
level
* assertion text is wrong: "image should be deleted" – but it's checking 
that "edits.new" should be deleted.
Fixed in the previous patch- as per the latest fix told by Konstantin

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981-2.patch, HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-07-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061698#comment-13061698
 ] 

Todd Lipcon commented on HDFS-1981:
---

- As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, 
and use the MiniDFSCluster builder
- typo: NEW_EIDTS_STREAM
- don't use the string constant "dfs.name.dir" - there are constants in 
DFSConfigKeys for this
- "false == editsNew.exists()" ?? !editsNew.exists()
- TODOs in the test case. don't swallow exceptions
- you can use IOUtils.cleanup or IOUtils.closeStream in the finally block 
inside of the block
- no need to clear editsStreams in teardown method - it's an instance var so it 
will be recreated for each case anyway
- what's the purpose of the setup which creates bImg? It's not used in any of 
the test cases.
- assertion text is wrong: "image should be deleted" -- but it's checking that 
"edits.new" should be deleted.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056591#comment-13056591
 ] 

Hadoop QA commented on HDFS-1981:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12484445/HDFS-1981-1.patch
  against trunk revision 1140030.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/860//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/860//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/860//console

This message is automatically generated.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981-1.patch, HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054781#comment-13054781
 ] 

Konstantin Shvachko commented on HDFS-1981:
---

Not sure what introduced it, but 
The problem is that NN does not saveNamespace() when editsNew is present.
This only happens in Ramakrishna's scenario, when editsNew is empty. That is 
when you start the checkpoint, and fail NN before modifying anything in the 
namespace.

Deleting editsNew, is probably valid, but not consistent, since at this stage 
NN is in read-only mode. That is if something goes wrong we should leave the 
storage directory in exactly the same state as it was before the startup.

I propose to increment numEdits if editsNew exists. This will trigger saving 
namespace after loading. So just one line change:
{code}
. if (editsNew.exists() && editsNew.length() > 0) {
+   numEdits ++;
edits = new EditLogFileInputStream(editsNew);
numEdits += loader.loadFSEdits(edits);
edits.close();
  }
{code}
Well, may be not one line as you need to increment even if {{editsNew.length() 
== 0}}.

Your test should work in this case as well. Could you please convert it to 
JUnit4 and use {{MiniDFSCluster.Builder}} instead of a direct constructor.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049924#comment-13049924
 ] 

Hadoop QA commented on HDFS-1981:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482669/HDFS-1981.patch
  against trunk revision 1135329.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 32 javac compiler warnings (more 
than the trunk's current 31 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/786//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/786//console

This message is automatically generated.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038957#comment-13038957
 ] 

ramkrishna.s.vasudevan commented on HDFS-1981:
--

Writing UT for this may be difficult to reproduce the scenario.

The steps that I followed to reproduce this issue are
1. Start namenode and backup namenode
2. Allow checkpointing to happen such that the edits.new file is 
created on the namenode.
3. At this point kill the NN and BNN.
4. Now start the NN and BNN.
5. When checkpointing starts again we will get the above exception.


The exact problem comes in the loadFSEdits() api in  FSImage.java

Here if the loadFSEdits() api returns 0 then 

if (fsImage.recoverTransitionRead(dataDirs, editsDirs, startOpt)) {
  fsImage.saveNamespace(true);
}

saveNamespace() will not be invoked.

Kindly correct me if you find any problems in this.



> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-05-23 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037949#comment-13037949
 ] 

Todd Lipcon commented on HDFS-1981:
---

Hi Ramkrishna. Can you provide a unit test which shows this issue? It would be 
especially good to see such a test against 0.22, since HDFS-1073 will 
restructure all this code when it's merged into 0.23.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira