Re: Help - can't start namenode after disk full error

2011-10-03 Thread Shouguo Li
hi, Ryan

i'm trying to recover from disk full error on the namenode as well. i can fire 
up namenode after printf \xff\xff\xff\xee\xff  /var/name/current/edits
but now it's stuck in safe mode verifying blocks for hours... is there a way to 
check progress on that?
or is there a way to speed that verify process up?
thx

Help - can't start namenode after disk full error

2011-06-12 Thread Ryan LeCompte
Hey guys,

Really trying to get our namenode back up and running after a full disk
error last night. I've freed up a lot of space, however the NameNode still
fails to startup:

2011-06-12 10:26:09,042 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-12 10:26:09,083 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 614919
2011-06-12 10:26:22,293 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 17
2011-06-12 10:26:22,300 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 102029859 loaded in 13 seconds.
2011-06-12 10:26:22,510 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: 
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:468)
at java.lang.Short.parseShort(Short.java:120)
at java.lang.Short.parseShort(Short.java:78)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.java:1269)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:550)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


We currently have our config setup as follows:

 property
namedfs.name.dir/name

value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/name,/data4/hadoop/dfs/name/value
  /property

I've looked in each of those directories for an image/edits.new file, but
only the edits files exist.

Can anyone please guide me on the next step here to get this back up and
running?

Thanks!

Ryan


RE: Help - can't start namenode after disk full error

2011-06-12 Thread Zhong, Andy
Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff 
edits, and start HDFS? It should work. - Andy 

-Original Message-
From: Ryan LeCompte [mailto:lecom...@gmail.com] 
Sent: Sunday, June 12, 2011 9:29 AM
To: common-user@hadoop.apache.org
Subject: Help - can't start namenode after disk full error

Hey guys,

Really trying to get our namenode back up and running after a full disk
error last night. I've freed up a lot of space, however the NameNode
still fails to startup:

2011-06-12 10:26:09,042 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-12 10:26:09,083 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 614919
2011-06-12 10:26:22,293 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 17
2011-06-12 10:26:22,300 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 102029859 loaded in 13 seconds.
2011-06-12 10:26:22,510 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: 
at
java.lang.NumberFormatException.forInputString(NumberFormatException.jav
a:48)
at java.lang.Integer.parseInt(Integer.java:468)
at java.lang.Short.parseShort(Short.java:120)
at java.lang.Short.parseShort(Short.java:78)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
a:1269)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j
ava:550)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
992)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
812)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI
mage.java:364)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto
ry.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys
tem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.
java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java
:201)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279
)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.
java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


We currently have our config setup as follows:

 property
namedfs.name.dir/name

value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n
ame,/data4/hadoop/dfs/name/value
  /property

I've looked in each of those directories for an image/edits.new file,
but only the edits files exist.

Can anyone please guide me on the next step here to get this back up and
running?

Thanks!

Ryan

This message, including any attachments, is the property of Sears Holdings 
Corporation and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.


RE: Help - can't start namenode after disk full error

2011-06-12 Thread Zhong, Andy
Only apply it to /dfs/name/current/edits file...

-Original Message-
From: Zhong, Andy [mailto:sheng.zh...@searshc.com] 
Sent: Sunday, June 12, 2011 9:43 AM
To: common-user@hadoop.apache.org
Subject: RE: Help - can't start namenode after disk full error

Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff 
edits, and start HDFS? It should work. - Andy 

-Original Message-
From: Ryan LeCompte [mailto:lecom...@gmail.com]
Sent: Sunday, June 12, 2011 9:29 AM
To: common-user@hadoop.apache.org
Subject: Help - can't start namenode after disk full error

Hey guys,

Really trying to get our namenode back up and running after a full disk
error last night. I've freed up a lot of space, however the NameNode
still fails to startup:

2011-06-12 10:26:09,042 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-06-12 10:26:09,083 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 614919
2011-06-12 10:26:22,293 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 17
2011-06-12 10:26:22,300 INFO
org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 102029859 loaded in 13 seconds.
2011-06-12 10:26:22,510 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: 
at
java.lang.NumberFormatException.forInputString(NumberFormatException.jav
a:48)
at java.lang.Integer.parseInt(Integer.java:468)
at java.lang.Short.parseShort(Short.java:120)
at java.lang.Short.parseShort(Short.java:78)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
a:1269)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j
ava:550)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
992)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
812)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI
mage.java:364)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto
ry.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys
tem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.
java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java
:201)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279
)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.
java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


We currently have our config setup as follows:

 property
namedfs.name.dir/name

value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n
ame,/data4/hadoop/dfs/name/value
  /property

I've looked in each of those directories for an image/edits.new file,
but only the edits files exist.

Can anyone please guide me on the next step here to get this back up and
running?

Thanks!

Ryan

This message, including any attachments, is the property of Sears
Holdings Corporation and/or one of its subsidiaries. It is confidential
and may contain proprietary or legally privileged information. If you
are not the intended recipient, please delete it without reading the
contents. Thank you.


Re: Help - can't start namenode after disk full error

2011-06-12 Thread Ryan LeCompte
That worked, thanks!

On Sun, Jun 12, 2011 at 10:47 AM, Zhong, Andy sheng.zh...@searshc.comwrote:

 Only apply it to /dfs/name/current/edits file...

 -Original Message-
 From: Zhong, Andy [mailto:sheng.zh...@searshc.com]
 Sent: Sunday, June 12, 2011 9:43 AM
 To: common-user@hadoop.apache.org
 Subject: RE: Help - can't start namenode after disk full error

 Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff 
 edits, and start HDFS? It should work. - Andy

 -Original Message-
 From: Ryan LeCompte [mailto:lecom...@gmail.com]
 Sent: Sunday, June 12, 2011 9:29 AM
 To: common-user@hadoop.apache.org
 Subject: Help - can't start namenode after disk full error

 Hey guys,

 Really trying to get our namenode back up and running after a full disk
 error last night. I've freed up a lot of space, however the NameNode
 still fails to startup:

 2011-06-12 10:26:09,042 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2011-06-12 10:26:09,083 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 614919
 2011-06-12 10:26:22,293 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 17
 2011-06-12 10:26:22,300 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 102029859 loaded in 13 seconds.
 2011-06-12 10:26:22,510 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode:
 java.lang.NumberFormatException: For input string: 
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.jav
 a:48)
 at java.lang.Integer.parseInt(Integer.java:468)
 at java.lang.Short.parseShort(Short.java:120)
 at java.lang.Short.parseShort(Short.java:78)
 at
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
 a:1269)
 at
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j
 ava:550)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
 992)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
 812)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI
 mage.java:364)
 at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto
 ry.java:87)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys
 tem.java:311)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.
 java:292)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java
 :201)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279
 )
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.
 java:956)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


 We currently have our config setup as follows:

  property
namedfs.name.dir/name

 value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n
 ame,/data4/hadoop/dfs/name/value
  /property

 I've looked in each of those directories for an image/edits.new file,
 but only the edits files exist.

 Can anyone please guide me on the next step here to get this back up and
 running?

 Thanks!

 Ryan

 This message, including any attachments, is the property of Sears
 Holdings Corporation and/or one of its subsidiaries. It is confidential
 and may contain proprietary or legally privileged information. If you
 are not the intended recipient, please delete it without reading the
 contents. Thank you.



Re: Help - can't start namenode after disk full error

2011-06-12 Thread Zhong, Sheng
My pleasure!

- Original Message -
From: Ryan LeCompte [mailto:lecom...@gmail.com]
Sent: Sunday, June 12, 2011 10:59 AM
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: Re: Help - can't start namenode after disk full error

That worked, thanks!

On Sun, Jun 12, 2011 at 10:47 AM, Zhong, Andy sheng.zh...@searshc.comwrote:

 Only apply it to /dfs/name/current/edits file...

 -Original Message-
 From: Zhong, Andy [mailto:sheng.zh...@searshc.com]
 Sent: Sunday, June 12, 2011 9:43 AM
 To: common-user@hadoop.apache.org
 Subject: RE: Help - can't start namenode after disk full error

 Could you backup your edits file, try $ printf \xff\xff\xff\xee\xff 
 edits, and start HDFS? It should work. - Andy

 -Original Message-
 From: Ryan LeCompte [mailto:lecom...@gmail.com]
 Sent: Sunday, June 12, 2011 9:29 AM
 To: common-user@hadoop.apache.org
 Subject: Help - can't start namenode after disk full error

 Hey guys,

 Really trying to get our namenode back up and running after a full disk
 error last night. I've freed up a lot of space, however the NameNode
 still fails to startup:

 2011-06-12 10:26:09,042 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2011-06-12 10:26:09,083 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 614919
 2011-06-12 10:26:22,293 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 17
 2011-06-12 10:26:22,300 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 102029859 loaded in 13 seconds.
 2011-06-12 10:26:22,510 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode:
 java.lang.NumberFormatException: For input string: 
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.jav
 a:48)
 at java.lang.Integer.parseInt(Integer.java:468)
 at java.lang.Short.parseShort(Short.java:120)
 at java.lang.Short.parseShort(Short.java:78)
 at
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
 a:1269)
 at
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j
 ava:550)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
 992)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
 812)
 at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI
 mage.java:364)
 at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto
 ry.java:87)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys
 tem.java:311)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.
 java:292)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java
 :201)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279
 )
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.
 java:956)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)


 We currently have our config setup as follows:

  property
namedfs.name.dir/name

 value/data1/hadoop/dfs/name,/data2/hadoop/dfs/name,/data3/hadoop/dfs/n
 ame,/data4/hadoop/dfs/name/value
  /property

 I've looked in each of those directories for an image/edits.new file,
 but only the edits files exist.

 Can anyone please guide me on the next step here to get this back up and
 running?

 Thanks!

 Ryan

 This message, including any attachments, is the property of Sears
 Holdings Corporation and/or one of its subsidiaries. It is confidential
 and may contain proprietary or legally privileged information. If you
 are not the intended recipient, please delete it without reading the
 contents. Thank you.



Re: can't start namenode

2010-05-19 Thread Ted Yu
We encountered a similar issue with hadoop-0.20.2+228 in QA:

2010-05-19 07:12:19,976 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-05-19 07:12:19,978 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-05-19 07:12:20,041 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2010-05-19 07:12:20,041 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-05-19 07:12:20,041 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-05-19 07:12:20,050 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-05-19 07:12:20,052 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-05-19 07:12:20,091 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1874
2010-05-19 07:12:20,503 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 2
2010-05-19 07:12:20,787 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 259450 loaded in 0 seconds.
2010-05-19 07:12:21,176 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: 
   at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:431)
   at java.lang.Long.parseLong(Long.java:468)
   at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
   at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:656)
   at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:999)
   at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
   at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
   at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
   at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
   at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:293)
   at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
   at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:306)
   at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
   at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)

2010-05-19 07:12:21,177 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

--
I don't see edits.new under name.dir/current/

Please advise what to do next.

Thanks

On Thu, Mar 4, 2010 at 12:50 PM, Todd Lipcon t...@cloudera.com wrote:

 Hi Mike,

 Since you removed the edits, you restored to an earlier version of the
 namesystem. Thus, any files that were deleted since the last checkpoint
 will
 have come back. But, the blocks will have been removed from the datanodes.
 So, the NN is complaining since there are some files that have missing
 blocks. That is to say, some of your files are corrupt (ie unreadable
 because the data is gone but the metadata is still there)

 In order to force it out of safemode, you can run hadoop dfsadmin -safemode
 leave
 You should also run hadoop fsck in order to determine which files are
 broken, and then probably use the -delete option to remove their metadata.

 Thanks
 -Todd

 On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.com
 wrote:

  Removing edits.new and starting worked, though it didn't seem that
  happy about it. It started up nonetheless, in safe mode. Saying that
  The ratio of reported blocks 0.9948 has not reached the threshold
  0.9990. Safe mode will be turned off automatically. Unfortunately
  this is holding up the restart of hbase.
 
  About how long does it take to exit safe mode? is there anything I can
  do to expedite the process?
 
 
 
  On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote:
  
   Sorry, I actually meant ls -l from name.dir/current/
  
   Having only one dfs.name.dir isn't recommended - after you get your
  system
   back up and running I would strongly suggest running with at least two,
   preferably with one on a separate server via NFS.
  
   Thanks
   -Todd
  
   On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com
  wrote:
  
We have a single dfs.name.dir directory, in case it's useful the
  contents
are:
   
[m...@carr name]$ ls -l
total 8
drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
   
   
   
   
On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com
  wrote:
   
 Hi Mike,

 Was your namenode configured with 

Re: can't start namenode

2010-03-04 Thread mike anderson
We have a single dfs.name.dir directory, in case it's useful the contents
are:

[m...@carr name]$ ls -l
total 8
drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image




On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote:

 Hi Mike,

 Was your namenode configured with multiple dfs.name.dir settings?

 If so, can you please reply with ls -l from each dfs.name.dir?

 Thanks
 -Todd

 On Thu, Mar 4, 2010 at 8:57 AM, mike anderson saidthero...@gmail.com
 wrote:

  Our hadoop cluster went down last night when the namenode ran out of hard
  drive space. Trying to restart fails with this exception (see below).
 
  Since I don't really care that much about losing a days worth of data or
 so
  I'm fine with blowing away the edits file if that's what it takes (we
 don't
  have a secondary namenode to restore from). I tried removing the edits
 file
  from the namenode directory, but then it complained about not finding an
  edits file. I touched a blank edits file and I got the exact same
  exception.
 
  Any thoughts? I googled around a bit, but to no avail.
 
  -mike
 
 
  2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
  Initializing RPC Metrics with hostName=NameNode, port=54310
  2010-03-04 10:50:44,772 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
  carr.projectlounge.com/10.0.16.91:54310
  2010-03-04 
  http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-0410:50:44,773
 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
  Initializing JVM Metrics with processName=NameNode, sessionId=null
  2010-03-04 10:50:44,774 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
  Initializing
  NameNodeMeterics using context
  object:org.apache.hadoop.metrics.spi.NullContext
  2010-03-04 10:50:44,816 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 fsOwner=pubget,pubget
  2010-03-04 10:50:44,817 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 supergroup=supergroup
  2010-03-04 10:50:44,817 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
  isPermissionEnabled=true
  2010-03-04 10:50:44,823 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
  Initializing FSNamesystemMetrics using context
  object:org.apache.hadoop.metrics.spi.NullContext
  2010-03-04 10:50:44,825 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
  FSNamesystemStatusMBean
  2010-03-04 10:50:44,849 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files = 2687
  2010-03-04 10:50:45,092 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files under construction = 7
  2010-03-04 10:50:45,095 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Image file of size 347821 loaded in 0 seconds.
  2010-03-04 10:50:45,104 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded
 in
  0 seconds.
  2010-03-04 10:50:45,114 ERROR
  org.apache.hadoop.hdfs.server.namenode.NameNode:
  java.lang.NumberFormatException: For input string: 
  at
 
 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  at java.lang.Long.parseLong(Long.java:424)
  at java.lang.Long.parseLong(Long.java:461)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:670)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
  at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
  at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 
  2010-03-04 10:50:45,115 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
  /
  SHUTDOWN_MSG: Shutting down NameNode at
 carr.projectlounge.com/10.0.16.91
  /
 



Re: can't start namenode

2010-03-04 Thread Todd Lipcon
Hi Mike,

Since you removed the edits, you restored to an earlier version of the
namesystem. Thus, any files that were deleted since the last checkpoint will
have come back. But, the blocks will have been removed from the datanodes.
So, the NN is complaining since there are some files that have missing
blocks. That is to say, some of your files are corrupt (ie unreadable
because the data is gone but the metadata is still there)

In order to force it out of safemode, you can run hadoop dfsadmin -safemode
leave
You should also run hadoop fsck in order to determine which files are
broken, and then probably use the -delete option to remove their metadata.

Thanks
-Todd

On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.comwrote:

 Removing edits.new and starting worked, though it didn't seem that
 happy about it. It started up nonetheless, in safe mode. Saying that
 The ratio of reported blocks 0.9948 has not reached the threshold
 0.9990. Safe mode will be turned off automatically. Unfortunately
 this is holding up the restart of hbase.

 About how long does it take to exit safe mode? is there anything I can
 do to expedite the process?



 On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote:
 
  Sorry, I actually meant ls -l from name.dir/current/
 
  Having only one dfs.name.dir isn't recommended - after you get your
 system
  back up and running I would strongly suggest running with at least two,
  preferably with one on a separate server via NFS.
 
  Thanks
  -Todd
 
  On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com
 wrote:
 
   We have a single dfs.name.dir directory, in case it's useful the
 contents
   are:
  
   [m...@carr name]$ ls -l
   total 8
   drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
   drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
  
  
  
  
   On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com
 wrote:
  
Hi Mike,
   
Was your namenode configured with multiple dfs.name.dir settings?
   
If so, can you please reply with ls -l from each dfs.name.dir?
   
Thanks
-Todd
   
On Thu, Mar 4, 2010 at 8:57 AM, mike anderson 
 saidthero...@gmail.com
wrote:
   
 Our hadoop cluster went down last night when the namenode ran out
 of
   hard
 drive space. Trying to restart fails with this exception (see
 below).

 Since I don't really care that much about losing a days worth of
 data
   or
so
 I'm fine with blowing away the edits file if that's what it takes
 (we
don't
 have a secondary namenode to restore from). I tried removing the
 edits
file
 from the namenode directory, but then it complained about not
 finding
   an
 edits file. I touched a blank edits file and I got the exact same
 exception.

 Any thoughts? I googled around a bit, but to no avail.

 -mike


 2010-03-04 10:50:44,768 INFO
 org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=NameNode, port=54310
 2010-03-04 10:50:44,772 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 carr.projectlounge.com/10.0.16.91:54310
 2010-03-04 
   http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04
 10:50:44,773
INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2010-03-04 10:50:44,774 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
 Initializing
 NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-03-04 10:50:44,816 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=pubget,pubget
 2010-03-04 10:50:44,817 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
 2010-03-04 10:50:44,817 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2010-03-04 10:50:44,823 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-03-04 10:50:44,825 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2010-03-04 10:50:44,849 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 2687
 2010-03-04 10:50:45,092 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 7
 2010-03-04 10:50:45,095 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 347821 loaded in 0 seconds.
 2010-03-04 10:50:45,104 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39
   loaded
in
 0 seconds.
 2010-03-04 10:50:45,114 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode:
 java.lang.NumberFormatException: 

Re: can't start namenode

2010-03-04 Thread mike anderson
Todd, That did the trick. Thanks to everyone for the quick responses
and effective suggestions.

-Mike


On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon t...@cloudera.com wrote:
 Hi Mike,

 Since you removed the edits, you restored to an earlier version of the
 namesystem. Thus, any files that were deleted since the last checkpoint will
 have come back. But, the blocks will have been removed from the datanodes.
 So, the NN is complaining since there are some files that have missing
 blocks. That is to say, some of your files are corrupt (ie unreadable
 because the data is gone but the metadata is still there)

 In order to force it out of safemode, you can run hadoop dfsadmin -safemode
 leave
 You should also run hadoop fsck in order to determine which files are
 broken, and then probably use the -delete option to remove their metadata.

 Thanks
 -Todd

 On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.comwrote:

 Removing edits.new and starting worked, though it didn't seem that
 happy about it. It started up nonetheless, in safe mode. Saying that
 The ratio of reported blocks 0.9948 has not reached the threshold
 0.9990. Safe mode will be turned off automatically. Unfortunately
 this is holding up the restart of hbase.

 About how long does it take to exit safe mode? is there anything I can
 do to expedite the process?



 On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote:
 
  Sorry, I actually meant ls -l from name.dir/current/
 
  Having only one dfs.name.dir isn't recommended - after you get your
 system
  back up and running I would strongly suggest running with at least two,
  preferably with one on a separate server via NFS.
 
  Thanks
  -Todd
 
  On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com
 wrote:
 
   We have a single dfs.name.dir directory, in case it's useful the
 contents
   are:
  
   [m...@carr name]$ ls -l
   total 8
   drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
   drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
  
  
  
  
   On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com
 wrote:
  
Hi Mike,
   
Was your namenode configured with multiple dfs.name.dir settings?
   
If so, can you please reply with ls -l from each dfs.name.dir?
   
Thanks
-Todd
   
On Thu, Mar 4, 2010 at 8:57 AM, mike anderson 
 saidthero...@gmail.com
wrote:
   
 Our hadoop cluster went down last night when the namenode ran out
 of
   hard
 drive space. Trying to restart fails with this exception (see
 below).

 Since I don't really care that much about losing a days worth of
 data
   or
so
 I'm fine with blowing away the edits file if that's what it takes
 (we
don't
 have a secondary namenode to restore from). I tried removing the
 edits
file
 from the namenode directory, but then it complained about not
 finding
   an
 edits file. I touched a blank edits file and I got the exact same
 exception.

 Any thoughts? I googled around a bit, but to no avail.

 -mike


 2010-03-04 10:50:44,768 INFO
 org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=NameNode, port=54310
 2010-03-04 10:50:44,772 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 carr.projectlounge.com/10.0.16.91:54310
 2010-03-04 
   http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04
 10:50:44,773
INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2010-03-04 10:50:44,774 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
 Initializing
 NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-03-04 10:50:44,816 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=pubget,pubget
 2010-03-04 10:50:44,817 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
 2010-03-04 10:50:44,817 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2010-03-04 10:50:44,823 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2010-03-04 10:50:44,825 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2010-03-04 10:50:44,849 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 2687
 2010-03-04 10:50:45,092 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 7
 2010-03-04 10:50:45,095 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 347821 loaded in 0 seconds.
 2010-03-04 10:50:45,104 INFO
org.apache.hadoop.hdfs.server.common.Storage:
 Edits file