Re: Help - can't start namenode after disk full error
hi, Ryan i'm trying to recover from disk full error on the namenode as well. i can fire up namenode after printf \xff\xff\xff\xee\xff /var/name/current/edits but now it's stuck in safe mode verifying blocks for hours... is there a way to check progress on that? or is there a way to speed that verify process up? thx
Help - can't start namenode after disk full error
Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 614919 2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 17 2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102029859 loaded in 13 seconds. 2011-06-12 10:26:22,510 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.java:1269) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:550) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) We currently have our config setup as follows: property namedfs.name.dir/name value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/name,/data4/hadoop/dfs/name/value /property I've looked in each of those directories for an image/edits.new file, but only the edits files exist. Can anyone please guide me on the next step here to get this back up and running? Thanks! Ryan
RE: Help - can't start namenode after disk full error
Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff edits, and start HDFS? It should work. - Andy -Original Message- From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Sunday, June 12, 2011 9:29 AM To: common-user@hadoop.apache.org Subject: Help - can't start namenode after disk full error Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 614919 2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 17 2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102029859 loaded in 13 seconds. 2011-06-12 10:26:22,510 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.jav a:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav a:1269) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j ava:550) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java: 992) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java: 812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI mage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto ry.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys tem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem. java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java :201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) We currently have our config setup as follows: property namedfs.name.dir/name value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n ame,/data4/hadoop/dfs/name/value /property I've looked in each of those directories for an image/edits.new file, but only the edits files exist. Can anyone please guide me on the next step here to get this back up and running? Thanks! Ryan This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.
RE: Help - can't start namenode after disk full error
Only apply it to /dfs/name/current/edits file... -Original Message- From: Zhong, Andy [mailto:sheng.zh...@searshc.com] Sent: Sunday, June 12, 2011 9:43 AM To: common-user@hadoop.apache.org Subject: RE: Help - can't start namenode after disk full error Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff edits, and start HDFS? It should work. - Andy -Original Message- From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Sunday, June 12, 2011 9:29 AM To: common-user@hadoop.apache.org Subject: Help - can't start namenode after disk full error Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 614919 2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 17 2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102029859 loaded in 13 seconds. 2011-06-12 10:26:22,510 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.jav a:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav a:1269) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j ava:550) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java: 992) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java: 812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI mage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto ry.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys tem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem. java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java :201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) We currently have our config setup as follows: property namedfs.name.dir/name value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n ame,/data4/hadoop/dfs/name/value /property I've looked in each of those directories for an image/edits.new file, but only the edits files exist. Can anyone please guide me on the next step here to get this back up and running? Thanks! Ryan This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.
Re: Help - can't start namenode after disk full error
That worked, thanks! On Sun, Jun 12, 2011 at 10:47 AM, Zhong, Andy sheng.zh...@searshc.comwrote: Only apply it to /dfs/name/current/edits file... -Original Message- From: Zhong, Andy [mailto:sheng.zh...@searshc.com] Sent: Sunday, June 12, 2011 9:43 AM To: common-user@hadoop.apache.org Subject: RE: Help - can't start namenode after disk full error Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff edits, and start HDFS? It should work. - Andy -Original Message- From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Sunday, June 12, 2011 9:29 AM To: common-user@hadoop.apache.org Subject: Help - can't start namenode after disk full error Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 614919 2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 17 2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102029859 loaded in 13 seconds. 2011-06-12 10:26:22,510 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.jav a:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav a:1269) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j ava:550) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java: 992) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java: 812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI mage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto ry.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys tem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem. java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java :201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) We currently have our config setup as follows: property namedfs.name.dir/name value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n ame,/data4/hadoop/dfs/name/value /property I've looked in each of those directories for an image/edits.new file, but only the edits files exist. Can anyone please guide me on the next step here to get this back up and running? Thanks! Ryan This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.
Re: Help - can't start namenode after disk full error
My pleasure! - Original Message - From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Sunday, June 12, 2011 10:59 AM To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: Re: Help - can't start namenode after disk full error That worked, thanks! On Sun, Jun 12, 2011 at 10:47 AM, Zhong, Andy sheng.zh...@searshc.comwrote: Only apply it to /dfs/name/current/edits file... -Original Message- From: Zhong, Andy [mailto:sheng.zh...@searshc.com] Sent: Sunday, June 12, 2011 9:43 AM To: common-user@hadoop.apache.org Subject: RE: Help - can't start namenode after disk full error Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff edits, and start HDFS? It should work. - Andy -Original Message- From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Sunday, June 12, 2011 9:29 AM To: common-user@hadoop.apache.org Subject: Help - can't start namenode after disk full error Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 614919 2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 17 2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102029859 loaded in 13 seconds. 2011-06-12 10:26:22,510 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.jav a:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav a:1269) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j ava:550) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java: 992) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java: 812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI mage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto ry.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys tem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem. java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java :201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) We currently have our config setup as follows: property namedfs.name.dir/name value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n ame,/data4/hadoop/dfs/name/value /property I've looked in each of those directories for an image/edits.new file, but only the edits files exist. Can anyone please guide me on the next step here to get this back up and running? Thanks! Ryan This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.
Re: can't start namenode
We encountered a similar issue with hadoop-0.20.2+228 in QA: 2010-05-19 07:12:19,976 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-05-19 07:12:19,978 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-05-19 07:12:20,041 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2010-05-19 07:12:20,041 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-05-19 07:12:20,041 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-05-19 07:12:20,050 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-05-19 07:12:20,052 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-05-19 07:12:20,091 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1874 2010-05-19 07:12:20,503 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2 2010-05-19 07:12:20,787 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 259450 loaded in 0 seconds. 2010-05-19 07:12:21,176 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:431) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:656) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:999) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:293) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:306) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013) 2010-05-19 07:12:21,177 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: -- I don't see edits.new under name.dir/current/ Please advise what to do next. Thanks On Thu, Mar 4, 2010 at 12:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were deleted since the last checkpoint will have come back. But, the blocks will have been removed from the datanodes. So, the NN is complaining since there are some files that have missing blocks. That is to say, some of your files are corrupt (ie unreadable because the data is gone but the metadata is still there) In order to force it out of safemode, you can run hadoop dfsadmin -safemode leave You should also run hadoop fsck in order to determine which files are broken, and then probably use the -delete option to remove their metadata. Thanks -Todd On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.com wrote: Removing edits.new and starting worked, though it didn't seem that happy about it. It started up nonetheless, in safe mode. Saying that The ratio of reported blocks 0.9948 has not reached the threshold 0.9990. Safe mode will be turned off automatically. Unfortunately this is holding up the restart of hbase. About how long does it take to exit safe mode? is there anything I can do to expedite the process? On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: Sorry, I actually meant ls -l from name.dir/current/ Having only one dfs.name.dir isn't recommended - after you get your system back up and running I would strongly suggest running with at least two, preferably with one on a separate server via NFS. Thanks -Todd On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com wrote: We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was your namenode configured with
Re: can't start namenode
We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was your namenode configured with multiple dfs.name.dir settings? If so, can you please reply with ls -l from each dfs.name.dir? Thanks -Todd On Thu, Mar 4, 2010 at 8:57 AM, mike anderson saidthero...@gmail.com wrote: Our hadoop cluster went down last night when the namenode ran out of hard drive space. Trying to restart fails with this exception (see below). Since I don't really care that much about losing a days worth of data or so I'm fine with blowing away the edits file if that's what it takes (we don't have a secondary namenode to restore from). I tried removing the edits file from the namenode directory, but then it complained about not finding an edits file. I touched a blank edits file and I got the exact same exception. Any thoughts? I googled around a bit, but to no avail. -mike 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2010-03-04 10:50:44,772 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: carr.projectlounge.com/10.0.16.91:54310 2010-03-04 http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-0410:50:44,773 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-03-04 10:50:44,774 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,816 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=pubget,pubget 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-03-04 10:50:44,823 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,825 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-03-04 10:50:44,849 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2687 2010-03-04 10:50:45,092 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7 2010-03-04 10:50:45,095 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 347821 loaded in 0 seconds. 2010-03-04 10:50:45,104 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded in 0 seconds. 2010-03-04 10:50:45,114 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:424) at java.lang.Long.parseLong(Long.java:461) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:670) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2010-03-04 10:50:45,115 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at carr.projectlounge.com/10.0.16.91 /
Re: can't start namenode
Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were deleted since the last checkpoint will have come back. But, the blocks will have been removed from the datanodes. So, the NN is complaining since there are some files that have missing blocks. That is to say, some of your files are corrupt (ie unreadable because the data is gone but the metadata is still there) In order to force it out of safemode, you can run hadoop dfsadmin -safemode leave You should also run hadoop fsck in order to determine which files are broken, and then probably use the -delete option to remove their metadata. Thanks -Todd On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.comwrote: Removing edits.new and starting worked, though it didn't seem that happy about it. It started up nonetheless, in safe mode. Saying that The ratio of reported blocks 0.9948 has not reached the threshold 0.9990. Safe mode will be turned off automatically. Unfortunately this is holding up the restart of hbase. About how long does it take to exit safe mode? is there anything I can do to expedite the process? On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: Sorry, I actually meant ls -l from name.dir/current/ Having only one dfs.name.dir isn't recommended - after you get your system back up and running I would strongly suggest running with at least two, preferably with one on a separate server via NFS. Thanks -Todd On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com wrote: We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was your namenode configured with multiple dfs.name.dir settings? If so, can you please reply with ls -l from each dfs.name.dir? Thanks -Todd On Thu, Mar 4, 2010 at 8:57 AM, mike anderson saidthero...@gmail.com wrote: Our hadoop cluster went down last night when the namenode ran out of hard drive space. Trying to restart fails with this exception (see below). Since I don't really care that much about losing a days worth of data or so I'm fine with blowing away the edits file if that's what it takes (we don't have a secondary namenode to restore from). I tried removing the edits file from the namenode directory, but then it complained about not finding an edits file. I touched a blank edits file and I got the exact same exception. Any thoughts? I googled around a bit, but to no avail. -mike 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2010-03-04 10:50:44,772 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: carr.projectlounge.com/10.0.16.91:54310 2010-03-04 http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04 10:50:44,773 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-03-04 10:50:44,774 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,816 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=pubget,pubget 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-03-04 10:50:44,823 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,825 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-03-04 10:50:44,849 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2687 2010-03-04 10:50:45,092 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7 2010-03-04 10:50:45,095 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 347821 loaded in 0 seconds. 2010-03-04 10:50:45,104 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded in 0 seconds. 2010-03-04 10:50:45,114 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException:
Re: can't start namenode
Todd, That did the trick. Thanks to everyone for the quick responses and effective suggestions. -Mike On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were deleted since the last checkpoint will have come back. But, the blocks will have been removed from the datanodes. So, the NN is complaining since there are some files that have missing blocks. That is to say, some of your files are corrupt (ie unreadable because the data is gone but the metadata is still there) In order to force it out of safemode, you can run hadoop dfsadmin -safemode leave You should also run hadoop fsck in order to determine which files are broken, and then probably use the -delete option to remove their metadata. Thanks -Todd On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.comwrote: Removing edits.new and starting worked, though it didn't seem that happy about it. It started up nonetheless, in safe mode. Saying that The ratio of reported blocks 0.9948 has not reached the threshold 0.9990. Safe mode will be turned off automatically. Unfortunately this is holding up the restart of hbase. About how long does it take to exit safe mode? is there anything I can do to expedite the process? On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: Sorry, I actually meant ls -l from name.dir/current/ Having only one dfs.name.dir isn't recommended - after you get your system back up and running I would strongly suggest running with at least two, preferably with one on a separate server via NFS. Thanks -Todd On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com wrote: We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was your namenode configured with multiple dfs.name.dir settings? If so, can you please reply with ls -l from each dfs.name.dir? Thanks -Todd On Thu, Mar 4, 2010 at 8:57 AM, mike anderson saidthero...@gmail.com wrote: Our hadoop cluster went down last night when the namenode ran out of hard drive space. Trying to restart fails with this exception (see below). Since I don't really care that much about losing a days worth of data or so I'm fine with blowing away the edits file if that's what it takes (we don't have a secondary namenode to restore from). I tried removing the edits file from the namenode directory, but then it complained about not finding an edits file. I touched a blank edits file and I got the exact same exception. Any thoughts? I googled around a bit, but to no avail. -mike 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2010-03-04 10:50:44,772 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: carr.projectlounge.com/10.0.16.91:54310 2010-03-04 http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04 10:50:44,773 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-03-04 10:50:44,774 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,816 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=pubget,pubget 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-03-04 10:50:44,823 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,825 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-03-04 10:50:44,849 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2687 2010-03-04 10:50:45,092 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7 2010-03-04 10:50:45,095 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 347821 loaded in 0 seconds. 2010-03-04 10:50:45,104 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file