Re: Backupnode in 1.0.0?
Thanks. Could you clarify what BackupNode does? -jeremy On Feb 22, 2012, at 5:24 PM, Joey Echeverria wrote: Check out this branch for the 0.22 version of Bigtop: https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/ However, I don't think BackupNode is what you want. It sounds like you want HA which is coming in (hopefully) 0.23.2 and is also available today in CDH4b1. -Joey On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote: By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Backupnode in 1.0.0?
It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: Backupnode in 1.0.0?
I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: Backupnode in 1.0.0?
By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: IMAGE_AND_EDITS Failed
The problem is that fsimage and edits are no longer being updated, so…if I restart, how could it replay those? -jeremy On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote: Actually I take that back. Restarting the NN might not result in loss of data. It will probably just take longer to start up because it would read the fsimage, then apply the fsedits (rather than the SNN doing it). On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash ravihad...@gmail.com wrote: Hi Jeremy, Couple of questions: 1. Which version of Hadoop are you using? 2. If you write something into HDFS, can you subsequently read it? 3. Are you sure your secondarynamenode configuration is correct? It seems like your SNN is telling your NN to roll the edit log (move the journaling directory from current to .new), but when it tries to download the image file, its not finding it. 3. I wish I could say I haven't ever seen that stack trace in the logs. I was seeing something similar (not the same, quite far from it actually) ( https://issues.apache.org/jira/browse/HDFS-2011 ). If I were you, and I felt exceptionally brave (mind you I've worked with only test systems, no production sys-admin guts for me ;-) ) I would probably do everything I can, to get the secondarynamenode started properly and make it checkpoint properly. Me thinks restarting the namenode will most likely result in loss of data. Hope this helps Ravi. On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen jer...@skidrow.la wrote: I happened to notice this today and being fairly new to administering hadoop, I'm not exactly sure how to pull out of this situation without data loss. The checkpoint hasn't happened since Sept 2nd. -rw-r--r-- 1 hdfs hdfs8889 Sep 2 14:09 edits -rw-r--r-- 1 hdfs hdfs 195968056 Sep 2 14:09 fsimage -rw-r--r-- 1 hdfs hdfs 195979439 Sep 2 14:09 fsimage.ckpt -rw-r--r-- 1 hdfs hdfs 8 Sep 2 14:09 fstime -rw-r--r-- 1 hdfs hdfs 100 Sep 2 14:09 VERSION /mnt/data0/dfs/nn/image -rw-r--r-- 1 hdfs hdfs157 Sep 2 14:09 fsimage I'm also seeing this in the NN logs: 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem: Roll Edit Log from 10.10.10.11 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(* *FSImage.java:219) at org.apache.hadoop.hdfs.server.**namenode.FSImage.** getFsImageName(FSImage.java:**1584) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** run(GetImageServlet.java:75) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** run(GetImageServlet.java:70) at java.security.**AccessController.doPrivileged(**Native Method) at javax.security.auth.Subject.**doAs(Subject.java:396) at org.apache.hadoop.security.**UserGroupInformation.doAs(** UserGroupInformation.java:**1115) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.** doGet(GetImageServlet.java:70) at javax.servlet.http.**HttpServlet.service(** HttpServlet.java:707) at javax.servlet.http.**HttpServlet.service(** HttpServlet.java:820) at org.mortbay.jetty.servlet.**ServletHolder.handle(** ServletHolder.java:511) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1221) at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.** doFilter(HttpServer.java:824) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1212) at org.mortbay.jetty.servlet.**ServletHandler.handle(** ServletHandler.java:399) at org.mortbay.jetty.security.**SecurityHandler.handle(** SecurityHandler.java:216) at org.mortbay.jetty.servlet.**SessionHandler.handle(** SessionHandler.java:182) at org.mortbay.jetty.handler.**ContextHandler.handle(** ContextHandler.java:766) at org.mortbay.jetty.webapp.**WebAppContext.handle(** WebAppContext.java:450) at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(* *ContextHandlerCollection.java:**230) at org.mortbay.jetty.handler.**HandlerWrapper.handle(** HandlerWrapper.java:152) at org.mortbay.jetty.Server.**handle(Server.java:326) at org.mortbay.jetty.**HttpConnection.handleRequest(** HttpConnection.java:542) at org.mortbay.jetty.**HttpConnection$RequestHandler.** headerComplete(HttpConnection.**java:928) at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.** java:212) at org.mortbay.jetty.**HttpConnection.handle(** HttpConnection.java:404) On the secondary name node: 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode: java.io.FileNotFoundException: http://ftrr-nam6000
Re: IMAGE_AND_EDITS Failed
Things still work in hdfs but the edits file is not being updated. Timestamp is sept 2nd. -jeremy On Sep 7, 2011, at 9:45 AM, Ravi Prakash ravihad...@gmail.com wrote: If your HDFS is still working, the fsimage file won't be getting updated but the edits file still should. That's why I asked question 2. On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen jer...@skidrow.la wrote: The problem is that fsimage and edits are no longer being updated, so…if I restart, how could it replay those? -jeremy On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote: Actually I take that back. Restarting the NN might not result in loss of data. It will probably just take longer to start up because it would read the fsimage, then apply the fsedits (rather than the SNN doing it). On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash ravihad...@gmail.com wrote: Hi Jeremy, Couple of questions: 1. Which version of Hadoop are you using? 2. If you write something into HDFS, can you subsequently read it? 3. Are you sure your secondarynamenode configuration is correct? It seems like your SNN is telling your NN to roll the edit log (move the journaling directory from current to .new), but when it tries to download the image file, its not finding it. 3. I wish I could say I haven't ever seen that stack trace in the logs. I was seeing something similar (not the same, quite far from it actually) ( https://issues.apache.org/jira/browse/HDFS-2011 ). If I were you, and I felt exceptionally brave (mind you I've worked with only test systems, no production sys-admin guts for me ;-) ) I would probably do everything I can, to get the secondarynamenode started properly and make it checkpoint properly. Me thinks restarting the namenode will most likely result in loss of data. Hope this helps Ravi. On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen jer...@skidrow.la wrote: I happened to notice this today and being fairly new to administering hadoop, I'm not exactly sure how to pull out of this situation without data loss. The checkpoint hasn't happened since Sept 2nd. -rw-r--r-- 1 hdfs hdfs8889 Sep 2 14:09 edits -rw-r--r-- 1 hdfs hdfs 195968056 Sep 2 14:09 fsimage -rw-r--r-- 1 hdfs hdfs 195979439 Sep 2 14:09 fsimage.ckpt -rw-r--r-- 1 hdfs hdfs 8 Sep 2 14:09 fstime -rw-r--r-- 1 hdfs hdfs 100 Sep 2 14:09 VERSION /mnt/data0/dfs/nn/image -rw-r--r-- 1 hdfs hdfs157 Sep 2 14:09 fsimage I'm also seeing this in the NN logs: 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem: Roll Edit Log from 10.10.10.11 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(* *FSImage.java:219) at org.apache.hadoop.hdfs.server.**namenode.FSImage.** getFsImageName(FSImage.java:**1584) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** run(GetImageServlet.java:75) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** run(GetImageServlet.java:70) at java.security.**AccessController.doPrivileged(**Native Method) at javax.security.auth.Subject.**doAs(Subject.java:396) at org.apache.hadoop.security.**UserGroupInformation.doAs(** UserGroupInformation.java:**1115) at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.** doGet(GetImageServlet.java:70) at javax.servlet.http.**HttpServlet.service(** HttpServlet.java:707) at javax.servlet.http.**HttpServlet.service(** HttpServlet.java:820) at org.mortbay.jetty.servlet.**ServletHolder.handle(** ServletHolder.java:511) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1221) at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.** doFilter(HttpServer.java:824) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1212) at org.mortbay.jetty.servlet.**ServletHandler.handle(** ServletHandler.java:399) at org.mortbay.jetty.security.**SecurityHandler.handle(** SecurityHandler.java:216) at org.mortbay.jetty.servlet.**SessionHandler.handle(** SessionHandler.java:182) at org.mortbay.jetty.handler.**ContextHandler.handle(** ContextHandler.java:766) at org.mortbay.jetty.webapp.**WebAppContext.handle(** WebAppContext.java:450) at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(* *ContextHandlerCollection.java:**230) at org.mortbay.jetty.handler.**HandlerWrapper.handle(** HandlerWrapper.java:152) at org.mortbay.jetty.Server.**handle(Server.java:326) at org.mortbay.jetty.**HttpConnection.handleRequest(** HttpConnection.java:542) at org.mortbay.jetty.**HttpConnection$RequestHandler.** headerComplete(HttpConnection.**java:928) at org.mortbay.jetty.HttpParser
IMAGE_AND_EDITS Failed
I happened to notice this today and being fairly new to administering hadoop, I'm not exactly sure how to pull out of this situation without data loss. The checkpoint hasn't happened since Sept 2nd. -rw-r--r-- 1 hdfs hdfs8889 Sep 2 14:09 edits -rw-r--r-- 1 hdfs hdfs 195968056 Sep 2 14:09 fsimage -rw-r--r-- 1 hdfs hdfs 195979439 Sep 2 14:09 fsimage.ckpt -rw-r--r-- 1 hdfs hdfs 8 Sep 2 14:09 fstime -rw-r--r-- 1 hdfs hdfs 100 Sep 2 14:09 VERSION /mnt/data0/dfs/nn/image -rw-r--r-- 1 hdfs hdfs157 Sep 2 14:09 fsimage I'm also seeing this in the NN logs: 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.10.10.11 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219) at org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) On the secondary name node: 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313) at