Re: Backupnode in 1.0.0?

2012-02-23 Thread Jeremy Hansen
Thanks.  Could you clarify what BackupNode does?

-jeremy

On Feb 22, 2012, at 5:24 PM, Joey Echeverria wrote:

 Check out this branch for the 0.22 version of Bigtop:
 
 https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/
 
 However, I don't think BackupNode is what you want. It sounds like you
 want HA which is coming in (hopefully) 0.23.2 and is also available
 today in CDH4b1.
 
 -Joey
 
 On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote:
 By the way, I don't see anything 0.22 based in the bigtop repos.
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:
 
 I guess I thought that backupnode would provide some level of namenode 
 redundancy.  Perhaps I don't fully understand.
 
 I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
 Check out the Apache Bigtop project. I believe they have 0.22 RPMs.
 
 Out of curiosity, why are you interested in BackupNode?
 
 -Joey
 
 Sent from my iPhone
 
 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC 
 protocols,
 configuration etc. Hence it will not be available in bug fix release 
 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la 
 wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 
 
 
 
 
 
 -- 
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen

It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in 1.0.1?

Thanks
-jeremy

Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
I guess I thought that backupnode would provide some level of namenode 
redundancy.  Perhaps I don't fully understand.

I'll check out Bigtop.  I looked at it a while ago and forgot about it.

Thanks
-jeremy

On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:

 Check out the Apache Bigtop project. I believe they have 0.22 RPMs. 
 
 Out of curiosity, why are you interested in BackupNode?
 
 -Joey
 
 Sent from my iPhone
 
 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC protocols,
 configuration etc. Hence it will not be available in bug fix release 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 



Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
By the way, I don't see anything 0.22 based in the bigtop repos.

Thanks
-jeremy

On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:

 I guess I thought that backupnode would provide some level of namenode 
 redundancy.  Perhaps I don't fully understand.
 
 I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
 Check out the Apache Bigtop project. I believe they have 0.22 RPMs. 
 
 Out of curiosity, why are you interested in BackupNode?
 
 -Joey
 
 Sent from my iPhone
 
 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC protocols,
 configuration etc. Hence it will not be available in bug fix release 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 
 



Re: IMAGE_AND_EDITS Failed

2011-09-07 Thread Jeremy Hansen
The problem is that fsimage and edits are no longer being updated, so…if I 
restart, how could it replay those?

-jeremy


On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:

 Actually I take that back. Restarting the NN might not result in loss of
 data. It will probably just take longer to start up because it would read
 the fsimage, then apply the fsedits (rather than the SNN doing it).
 
 On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash ravihad...@gmail.com wrote:
 
 Hi Jeremy,
 
 Couple of questions:
 
 1. Which version of Hadoop are you using?
 2. If you write something into HDFS, can you subsequently read it?
 3. Are you sure your secondarynamenode configuration is correct? It seems
 like your SNN is telling your NN to roll the edit log (move the journaling
 directory from current to .new), but when it tries to download the image
 file, its not finding it.
 3. I wish I could say I haven't ever seen that stack trace in the logs. I
 was seeing something similar (not the same, quite far from it actually) (
 https://issues.apache.org/jira/browse/HDFS-2011 ).
 
 If I were you, and I felt exceptionally brave (mind you I've worked with
 only test systems, no production sys-admin guts for me ;-) ) I would
 probably do everything I can, to get the secondarynamenode started properly
 and make it checkpoint properly.
 
 Me thinks restarting the namenode will most likely result in loss of data.
 
 Hope this helps
 Ravi.
 
 
 
 
 On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 I happened to notice this today and being fairly new to administering
 hadoop, I'm not exactly sure how to pull out of this situation without data
 loss.
 
 The checkpoint hasn't happened since Sept 2nd.
 
 -rw-r--r-- 1 hdfs hdfs8889 Sep  2 14:09 edits
 -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
 -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
 -rw-r--r-- 1 hdfs hdfs   8 Sep  2 14:09 fstime
 -rw-r--r-- 1 hdfs hdfs 100 Sep  2 14:09 VERSION
 
 /mnt/data0/dfs/nn/image
 -rw-r--r-- 1 hdfs hdfs157 Sep  2 14:09 fsimage
 
 I'm also seeing this in the NN logs:
 
 2011-09-06 16:48:23,738 INFO 
 org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
 Roll Edit Log from 10.10.10.11
 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
 java.io.IOException: GetImage failed. java.lang.NullPointerException
   at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
 *FSImage.java:219)
   at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
 getFsImageName(FSImage.java:**1584)
   at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
 run(GetImageServlet.java:75)
   at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
 run(GetImageServlet.java:70)
   at java.security.**AccessController.doPrivileged(**Native Method)
   at javax.security.auth.Subject.**doAs(Subject.java:396)
   at org.apache.hadoop.security.**UserGroupInformation.doAs(**
 UserGroupInformation.java:**1115)
   at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
 doGet(GetImageServlet.java:70)
   at javax.servlet.http.**HttpServlet.service(**
 HttpServlet.java:707)
   at javax.servlet.http.**HttpServlet.service(**
 HttpServlet.java:820)
   at org.mortbay.jetty.servlet.**ServletHolder.handle(**
 ServletHolder.java:511)
   at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1221)
   at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
 doFilter(HttpServer.java:824)
   at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1212)
   at org.mortbay.jetty.servlet.**ServletHandler.handle(**
 ServletHandler.java:399)
   at org.mortbay.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:216)
   at org.mortbay.jetty.servlet.**SessionHandler.handle(**
 SessionHandler.java:182)
   at org.mortbay.jetty.handler.**ContextHandler.handle(**
 ContextHandler.java:766)
   at org.mortbay.jetty.webapp.**WebAppContext.handle(**
 WebAppContext.java:450)
   at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
 *ContextHandlerCollection.java:**230)
   at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
 HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.**handle(Server.java:326)
   at org.mortbay.jetty.**HttpConnection.handleRequest(**
 HttpConnection.java:542)
   at org.mortbay.jetty.**HttpConnection$RequestHandler.**
 headerComplete(HttpConnection.**java:928)
   at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
 java:212)
   at org.mortbay.jetty.**HttpConnection.handle(**
 HttpConnection.java:404)
 
 On the secondary name node:
 
 2011-09-06 16:51:53,538 ERROR 
 org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
 java.io.FileNotFoundException: http://ftrr-nam6000

Re: IMAGE_AND_EDITS Failed

2011-09-07 Thread Jeremy Hansen
Things still work in hdfs but the edits file is not being updated. Timestamp is 
sept 2nd. 

-jeremy

On Sep 7, 2011, at 9:45 AM, Ravi Prakash ravihad...@gmail.com wrote:

 If your HDFS is still working, the fsimage file won't be getting updated but
 the edits file still should. That's why I asked question 2.
 
 On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 The problem is that fsimage and edits are no longer being updated, so…if I
 restart, how could it replay those?
 
 -jeremy
 
 
 On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
 
 Actually I take that back. Restarting the NN might not result in loss of
 data. It will probably just take longer to start up because it would read
 the fsimage, then apply the fsedits (rather than the SNN doing it).
 
 On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash ravihad...@gmail.com
 wrote:
 
 Hi Jeremy,
 
 Couple of questions:
 
 1. Which version of Hadoop are you using?
 2. If you write something into HDFS, can you subsequently read it?
 3. Are you sure your secondarynamenode configuration is correct? It
 seems
 like your SNN is telling your NN to roll the edit log (move the
 journaling
 directory from current to .new), but when it tries to download the image
 file, its not finding it.
 3. I wish I could say I haven't ever seen that stack trace in the logs.
 I
 was seeing something similar (not the same, quite far from it actually)
 (
 https://issues.apache.org/jira/browse/HDFS-2011 ).
 
 If I were you, and I felt exceptionally brave (mind you I've worked with
 only test systems, no production sys-admin guts for me ;-) ) I would
 probably do everything I can, to get the secondarynamenode started
 properly
 and make it checkpoint properly.
 
 Me thinks restarting the namenode will most likely result in loss of
 data.
 
 Hope this helps
 Ravi.
 
 
 
 
 On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen jer...@skidrow.la
 wrote:
 
 
 I happened to notice this today and being fairly new to administering
 hadoop, I'm not exactly sure how to pull out of this situation without
 data
 loss.
 
 The checkpoint hasn't happened since Sept 2nd.
 
 -rw-r--r-- 1 hdfs hdfs8889 Sep  2 14:09 edits
 -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
 -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
 -rw-r--r-- 1 hdfs hdfs   8 Sep  2 14:09 fstime
 -rw-r--r-- 1 hdfs hdfs 100 Sep  2 14:09 VERSION
 
 /mnt/data0/dfs/nn/image
 -rw-r--r-- 1 hdfs hdfs157 Sep  2 14:09 fsimage
 
 I'm also seeing this in the NN logs:
 
 2011-09-06 16:48:23,738 INFO
 org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
 Roll Edit Log from 10.10.10.11
 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
 java.io.IOException: GetImage failed. java.lang.NullPointerException
  at
 org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
 *FSImage.java:219)
  at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
 getFsImageName(FSImage.java:**1584)
  at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
 run(GetImageServlet.java:75)
  at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
 run(GetImageServlet.java:70)
  at java.security.**AccessController.doPrivileged(**Native Method)
  at javax.security.auth.Subject.**doAs(Subject.java:396)
  at org.apache.hadoop.security.**UserGroupInformation.doAs(**
 UserGroupInformation.java:**1115)
  at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
 doGet(GetImageServlet.java:70)
  at javax.servlet.http.**HttpServlet.service(**
 HttpServlet.java:707)
  at javax.servlet.http.**HttpServlet.service(**
 HttpServlet.java:820)
  at org.mortbay.jetty.servlet.**ServletHolder.handle(**
 ServletHolder.java:511)
  at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1221)
  at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
 doFilter(HttpServer.java:824)
  at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1212)
  at org.mortbay.jetty.servlet.**ServletHandler.handle(**
 ServletHandler.java:399)
  at org.mortbay.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:216)
  at org.mortbay.jetty.servlet.**SessionHandler.handle(**
 SessionHandler.java:182)
  at org.mortbay.jetty.handler.**ContextHandler.handle(**
 ContextHandler.java:766)
  at org.mortbay.jetty.webapp.**WebAppContext.handle(**
 WebAppContext.java:450)
  at
 org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
 *ContextHandlerCollection.java:**230)
  at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
 HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.**handle(Server.java:326)
  at org.mortbay.jetty.**HttpConnection.handleRequest(**
 HttpConnection.java:542)
  at org.mortbay.jetty.**HttpConnection$RequestHandler.**
 headerComplete(HttpConnection.**java:928)
  at org.mortbay.jetty.HttpParser

IMAGE_AND_EDITS Failed

2011-09-06 Thread Jeremy Hansen


I happened to notice this today and being fairly new to administering 
hadoop, I'm not exactly sure how to pull out of this situation without 
data loss.


The checkpoint hasn't happened since Sept 2nd.

-rw-r--r-- 1 hdfs hdfs8889 Sep  2 14:09 edits
-rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
-rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs   8 Sep  2 14:09 fstime
-rw-r--r-- 1 hdfs hdfs 100 Sep  2 14:09 VERSION

/mnt/data0/dfs/nn/image
-rw-r--r-- 1 hdfs hdfs157 Sep  2 14:09 fsimage

I'm also seeing this in the NN logs:

2011-09-06 16:48:23,738 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
10.10.10.11
2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: 
GetImage failed. java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

On the secondary name node:

2011-09-06 16:51:53,538 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
java.io.FileNotFoundException: 
http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)
at