[ https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022920#comment-13022920 ]
Eli Collins commented on HDFS-1594: ----------------------------------- Looks great. Minor comments follow. Could you fold the call to safeMode.setResourcesLow() on line 2891 into enterSafeMode? Ie I think enterSafeMode(true) should always result in a call to setResourcesLow. Perhaps include "resouce" in the "dfs.nn.du.reserved" and "dfs.nn.checked.volumes" key names so it's clear there's a relationship between them and "dfs.nn.resource.check.interval". In the NameNodeResourceChecker class header comment what do you mean by "heap space available on all volumes"? Would it be hard to write a test that crosses the threshold, eg set the limit based on current available space minus say 500KB then create a large file and assert the NN went into SM? Nits: * FSNameSystem line 570: missing space after "if", and extra space after "interrupt". line 4058 has extra parens. * isResourcesLow -> areResourcesLow or just resourcesLow * NameNodeResourceChecker line 118, <= should technically be <, or the message should say something like the available space has reached the reserved amount. * In the test extra newline on line 68 ("in case of errors") > When the disk becomes full Namenode is getting shutdown and not able to > recover > ------------------------------------------------------------------------------- > > Key: HDFS-1594 > URL: https://issues.apache.org/jira/browse/HDFS-1594 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.21.0, 0.21.1, 0.22.0 > Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 > 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Devaraj K > Assignee: Aaron T. Myers > Fix For: 0.23.0 > > Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch, > hadoop-root-namenode-linux124.log, hdfs-1594.0.patch, hdfs-1594.1.patch, > hdfs-1594.2.patch, hdfs-1594.3.patch, hdfs-1594.4.patch > > > When the disk becomes full name node is shutting down and if we try to start > after making the space available It is not starting and throwing the below > exception. > {code:xml} > 2011-01-24 23:23:33,727 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) > at > org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) > 2011-01-24 23:23:33,729 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) > at > org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) > 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124 > ************************************************************/ > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira