Hi Tim, jps should show it running if it's "hung".
In that case, see if you can get a stack trace using "jstack <pid>". Paste that trace and we may be able to figure out where it's hung up. -Todd On Thu, Oct 15, 2009 at 12:12 AM, tim robertson <[email protected]> wrote: > Thanks for the info Todd, > >> I have also seen this error, but it's not fatal. > Ok, I'll ignore it for the time being. > >> Is this log from just the NameNode or did you tail multiple logs? It >> seems odd that your namenode would be trying to make an IPC client to >> itself (port 8020). > > Just the name node. > >> After you see these messages, does your namenode shut down? > > It kind of just hangs really (Already tried 9 times, Already tried 10 times > etc) > >> Does jps show it running? Is the web interface available at port 50070? > > No, it never is available. > > I'll keep hunting for some config error. > > Cheers > Tim > > >> >> -Todd >> >> >> On Wed, Oct 14, 2009 at 2:33 AM, tim robertson >> <[email protected]> wrote: >>> Hi all, >>> >>> Using hadoop 0.20.1 I am seeing the following on my namenode startup. >>> >>> 2009-10-14 11:09:54,232 INFO org.apache.hadoop.ipc.Server: Error >>> register getProtocolVersion >>> java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion >>> >>> Could someone please point me in the right direction for diagnosing >>> where I have gone wrong? >>> >>> Thanks >>> Tim >>> >>> >>> >>> /************************************************************ >>> STARTUP_MSG: Starting NameNode >>> STARTUP_MSG: host = cluma.gbif.org/192.38.28.77 >>> STARTUP_MSG: args = [] >>> STARTUP_MSG: version = 0.20.1 >>> STARTUP_MSG: build = >>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 >>> -r 810220; compiled by 'oom' on Tue Sep 1 20:55:56 UTC 2009 >>> ************************************************************/ >>> 2009-10-14 11:09:53,010 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >>> Initializing RPC Metrics with hostName=NameNode, port=8020 >>> 2009-10-14 11:09:53,013 INFO >>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: >>> cluma.gbif.clu/192.168.76.254:8020 >>> 2009-10-14 11:09:53,015 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>> Initializing JVM Metrics with processName=NameNode, sessionId=null >>> 2009-10-14 11:09:53,019 INFO >>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: >>> Initializing NameNodeMeterics using context >>> object:org.apache.hadoop.metrics.spi.NullContext >>> 2009-10-14 11:09:53,056 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>> fsOwner=root,root,bin,daemon,sys,adm,disk,wheel >>> 2009-10-14 11:09:53,057 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>> supergroup=supergroup >>> 2009-10-14 11:09:53,057 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>> isPermissionEnabled=true >>> 2009-10-14 11:09:53,064 INFO >>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: >>> Initializing FSNamesystemMetrics using context >>> object:org.apache.hadoop.metrics.spi.NullContext >>> 2009-10-14 11:09:53,065 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >>> FSNamesystemStatusMBean >>> 2009-10-14 11:09:53,090 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 >>> 2009-10-14 11:09:53,094 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under >>> construction = 0 >>> 2009-10-14 11:09:53,094 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 >>> loaded in 0 seconds. >>> 2009-10-14 11:09:53,094 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>> /hadoop/name/current/edits of size 4 edits # 0 loaded in 0 seconds. >>> 2009-10-14 11:09:53,098 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 >>> saved in 0 seconds. >>> 2009-10-14 11:09:53,113 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading >>> FSImage in 72 msecs >>> 2009-10-14 11:09:53,114 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of >>> blocks = 0 >>> 2009-10-14 11:09:53,114 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid >>> blocks = 0 >>> 2009-10-14 11:09:53,114 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of >>> under-replicated blocks = 0 >>> 2009-10-14 11:09:53,114 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of >>> over-replicated blocks = 0 >>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange: >>> STATE* Leaving safe mode after 0 secs. >>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange: >>> STATE* Network topology has 0 racks and 0 datanodes >>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange: >>> STATE* UnderReplicatedBlocks has 0 blocks >>> 2009-10-14 11:09:53,198 INFO org.mortbay.log: Logging to >>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >>> org.mortbay.log.Slf4jLog >>> 2009-10-14 11:09:53,242 INFO org.apache.hadoop.http.HttpServer: Port >>> returned by webServer.getConnectors()[0].getLocalPort() before open() >>> is -1. Opening the listener on 50070 >>> 2009-10-14 11:09:53,243 INFO org.apache.hadoop.http.HttpServer: >>> listener.getLocalPort() returned 50070 >>> webServer.getConnectors()[0].getLocalPort() returned 50070 >>> 2009-10-14 11:09:53,243 INFO org.apache.hadoop.http.HttpServer: Jetty >>> bound to port 50070 >>> 2009-10-14 11:09:53,243 INFO org.mortbay.log: jetty-6.1.14 >>> 2009-10-14 11:09:53,433 INFO org.mortbay.log: Started >>> [email protected]:50070 >>> 2009-10-14 11:09:53,434 INFO >>> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: >>> 0.0.0.0:50070 >>> 2009-10-14 11:09:53,434 INFO org.apache.hadoop.ipc.Server: IPC Server >>> Responder: starting >>> 2009-10-14 11:09:53,435 INFO org.apache.hadoop.ipc.Server: IPC Server >>> listener on 8020: starting >>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 0 on 8020: starting >>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 1 on 8020: starting >>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 2 on 8020: starting >>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 3 on 8020: starting >>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 4 on 8020: starting >>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 6 on 8020: starting >>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 7 on 8020: starting >>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 8 on 8020: starting >>> 2009-10-14 11:09:53,441 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 9 on 8020: starting >>> 2009-10-14 11:09:53,444 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 5 on 8020: starting >>> 2009-10-14 11:09:54,232 INFO org.apache.hadoop.ipc.Server: Error >>> register getProtocolVersion >>> java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion >>> at >>> org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53) >>> at >>> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89) >>> at >>> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) >>> 2009-10-14 11:09:54,667 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.3:50010 storage >>> DS-551038728-192.168.76.3-50010-1255511394664 >>> 2009-10-14 11:09:54,668 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.3:50010 >>> 2009-10-14 11:09:54,673 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.2:50010 storage >>> DS-681115827-192.168.76.2-50010-1255511394673 >>> 2009-10-14 11:09:54,674 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.2:50010 >>> 2009-10-14 11:09:54,689 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.7:50010 storage >>> DS-1731752631-192.168.76.7-50010-1255511394688 >>> 2009-10-14 11:09:54,689 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.7:50010 >>> 2009-10-14 11:09:54,701 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.6:50010 storage >>> DS-338163265-192.168.76.6-50010-1255511394698 >>> 2009-10-14 11:09:54,701 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.6:50010 >>> 2009-10-14 11:09:59,688 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.1:50010 storage >>> DS-988595128-192.168.76.1-50010-1255511399688 >>> 2009-10-14 11:09:59,689 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.1:50010 >>> 2009-10-14 11:10:04,277 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.9:50010 storage >>> DS-1096719686-192.168.76.9-50010-1255511404276 >>> 2009-10-14 11:10:04,278 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.9:50010 >>> 2009-10-14 11:10:04,289 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.5:50010 storage >>> DS-1405051412-192.168.76.5-50010-1255511404288 >>> 2009-10-14 11:10:04,289 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.5:50010 >>> 2009-10-14 11:10:05,961 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.8:50010 storage >>> DS-1138617755-192.168.76.8-50010-1255511405960 >>> 2009-10-14 11:10:05,961 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.8:50010 >>> 2009-10-14 11:10:06,154 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* NameSystem.registerDatanode: node registration from >>> 192.168.76.4:50010 storage >>> DS-569548170-192.168.76.4-50010-1255511406153 >>> 2009-10-14 11:10:06,154 INFO org.apache.hadoop.net.NetworkTopology: >>> Adding a new node: /default-rack/192.168.76.4:50010 >>> 2009-10-14 11:10:14,480 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 0 >>> time(s). >>> 2009-10-14 11:10:35,483 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 1 >>> time(s). >>> 2009-10-14 11:10:56,486 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 2 >>> time(s). >>> 2009-10-14 11:11:17,489 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 3 >>> time(s). >>> 2009-10-14 11:11:38,492 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 4 >>> time(s). >>> 2009-10-14 11:11:59,495 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 5 >>> time(s). >>> 2009-10-14 11:12:20,498 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 6 >>> time(s). >>> 2009-10-14 11:12:41,501 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 7 >>> time(s). >>> 2009-10-14 11:13:02,504 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 8 >>> time(s). >>> 2009-10-14 11:13:23,506 INFO org.apache.hadoop.ipc.Client: Retrying >>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 9 >>> time(s). >>> etc. >>> >> >
