Hi Todd, Brian

Thanks for your comments.
I got to the bottom of the problem.  A netstat showed me that all the
DNs could connect to the NN but the firewall was stopping the NN
complete the http handshake to itself.

It's all up and running now.

Thanks,
Tim


On Thu, Oct 15, 2009 at 10:23 AM, Todd Lipcon <t...@cloudera.com> wrote:
> Hi Tim,
>
> jps should show it running if it's "hung".
>
> In that case, see if you can get a stack trace using "jstack <pid>".
> Paste that trace and we may be able to figure out where it's hung up.
>
> -Todd
>
> On Thu, Oct 15, 2009 at 12:12 AM, tim robertson
> <timrobertson...@gmail.com> wrote:
>> Thanks for the info Todd,
>>
>>> I have also seen this error, but it's not fatal.
>> Ok, I'll ignore it for the time being.
>>
>>> Is this log from just the NameNode or did you tail multiple logs? It
>>> seems odd that your namenode would be trying to make an IPC client to
>>> itself (port 8020).
>>
>> Just the name node.
>>
>>> After you see these messages, does your namenode shut down?
>>
>> It kind of just hangs really (Already tried 9 times, Already tried 10 times 
>> etc)
>>
>>> Does jps show it running? Is the web interface available at port 50070?
>>
>> No, it never is available.
>>
>> I'll keep hunting for some config error.
>>
>> Cheers
>> Tim
>>
>>
>>>
>>> -Todd
>>>
>>>
>>> On Wed, Oct 14, 2009 at 2:33 AM, tim robertson
>>> <timrobertson...@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> Using hadoop 0.20.1 I am seeing the following on my namenode startup.
>>>>
>>>> 2009-10-14 11:09:54,232 INFO org.apache.hadoop.ipc.Server: Error
>>>> register getProtocolVersion
>>>> java.lang.IllegalArgumentException: Duplicate 
>>>> metricsName:getProtocolVersion
>>>>
>>>> Could someone please point me in the right direction for diagnosing
>>>> where I have gone wrong?
>>>>
>>>> Thanks
>>>> Tim
>>>>
>>>>
>>>>
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = cluma.gbif.org/192.38.28.77
>>>> STARTUP_MSG:   args = []
>>>> STARTUP_MSG:   version = 0.20.1
>>>> STARTUP_MSG:   build =
>>>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1
>>>> -r 810220; compiled by 'oom' on Tue Sep  1 20:55:56 UTC 2009
>>>> ************************************************************/
>>>> 2009-10-14 11:09:53,010 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>> Initializing RPC Metrics with hostName=NameNode, port=8020
>>>> 2009-10-14 11:09:53,013 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>> cluma.gbif.clu/192.168.76.254:8020
>>>> 2009-10-14 11:09:53,015 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>> 2009-10-14 11:09:53,019 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>>>> Initializing NameNodeMeterics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2009-10-14 11:09:53,056 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
>>>> 2009-10-14 11:09:53,057 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> supergroup=supergroup
>>>> 2009-10-14 11:09:53,057 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isPermissionEnabled=true
>>>> 2009-10-14 11:09:53,064 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>> Initializing FSNamesystemMetrics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2009-10-14 11:09:53,065 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStatusMBean
>>>> 2009-10-14 11:09:53,090 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
>>>> 2009-10-14 11:09:53,094 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>> construction = 0
>>>> 2009-10-14 11:09:53,094 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94
>>>> loaded in 0 seconds.
>>>> 2009-10-14 11:09:53,094 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>> /hadoop/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
>>>> 2009-10-14 11:09:53,098 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94
>>>> saved in 0 seconds.
>>>> 2009-10-14 11:09:53,113 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
>>>> FSImage in 72 msecs
>>>> 2009-10-14 11:09:53,114 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
>>>> blocks = 0
>>>> 2009-10-14 11:09:53,114 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
>>>> blocks = 0
>>>> 2009-10-14 11:09:53,114 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>>> under-replicated blocks = 0
>>>> 2009-10-14 11:09:53,114 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>>> over-replicated blocks = 0
>>>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange:
>>>> STATE* Leaving safe mode after 0 secs.
>>>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange:
>>>> STATE* Network topology has 0 racks and 0 datanodes
>>>> 2009-10-14 11:09:53,114 INFO org.apache.hadoop.hdfs.StateChange:
>>>> STATE* UnderReplicatedBlocks has 0 blocks
>>>> 2009-10-14 11:09:53,198 INFO org.mortbay.log: Logging to
>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>>> org.mortbay.log.Slf4jLog
>>>> 2009-10-14 11:09:53,242 INFO org.apache.hadoop.http.HttpServer: Port
>>>> returned by webServer.getConnectors()[0].getLocalPort() before open()
>>>> is -1. Opening the listener on 50070
>>>> 2009-10-14 11:09:53,243 INFO org.apache.hadoop.http.HttpServer:
>>>> listener.getLocalPort() returned 50070
>>>> webServer.getConnectors()[0].getLocalPort() returned 50070
>>>> 2009-10-14 11:09:53,243 INFO org.apache.hadoop.http.HttpServer: Jetty
>>>> bound to port 50070
>>>> 2009-10-14 11:09:53,243 INFO org.mortbay.log: jetty-6.1.14
>>>> 2009-10-14 11:09:53,433 INFO org.mortbay.log: Started
>>>> selectchannelconnec...@0.0.0.0:50070
>>>> 2009-10-14 11:09:53,434 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
>>>> 0.0.0.0:50070
>>>> 2009-10-14 11:09:53,434 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> Responder: starting
>>>> 2009-10-14 11:09:53,435 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> listener on 8020: starting
>>>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 0 on 8020: starting
>>>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 1 on 8020: starting
>>>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 2 on 8020: starting
>>>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 3 on 8020: starting
>>>> 2009-10-14 11:09:53,438 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 4 on 8020: starting
>>>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 6 on 8020: starting
>>>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 7 on 8020: starting
>>>> 2009-10-14 11:09:53,439 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 8 on 8020: starting
>>>> 2009-10-14 11:09:53,441 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 9 on 8020: starting
>>>> 2009-10-14 11:09:53,444 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 5 on 8020: starting
>>>> 2009-10-14 11:09:54,232 INFO org.apache.hadoop.ipc.Server: Error
>>>> register getProtocolVersion
>>>> java.lang.IllegalArgumentException: Duplicate 
>>>> metricsName:getProtocolVersion
>>>>        at 
>>>> org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
>>>>        at 
>>>> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89)
>>>>        at 
>>>> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99)
>>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
>>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>>>> 2009-10-14 11:09:54,667 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.3:50010 storage
>>>> DS-551038728-192.168.76.3-50010-1255511394664
>>>> 2009-10-14 11:09:54,668 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.3:50010
>>>> 2009-10-14 11:09:54,673 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.2:50010 storage
>>>> DS-681115827-192.168.76.2-50010-1255511394673
>>>> 2009-10-14 11:09:54,674 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.2:50010
>>>> 2009-10-14 11:09:54,689 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.7:50010 storage
>>>> DS-1731752631-192.168.76.7-50010-1255511394688
>>>> 2009-10-14 11:09:54,689 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.7:50010
>>>> 2009-10-14 11:09:54,701 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.6:50010 storage
>>>> DS-338163265-192.168.76.6-50010-1255511394698
>>>> 2009-10-14 11:09:54,701 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.6:50010
>>>> 2009-10-14 11:09:59,688 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.1:50010 storage
>>>> DS-988595128-192.168.76.1-50010-1255511399688
>>>> 2009-10-14 11:09:59,689 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.1:50010
>>>> 2009-10-14 11:10:04,277 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.9:50010 storage
>>>> DS-1096719686-192.168.76.9-50010-1255511404276
>>>> 2009-10-14 11:10:04,278 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.9:50010
>>>> 2009-10-14 11:10:04,289 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.5:50010 storage
>>>> DS-1405051412-192.168.76.5-50010-1255511404288
>>>> 2009-10-14 11:10:04,289 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.5:50010
>>>> 2009-10-14 11:10:05,961 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.8:50010 storage
>>>> DS-1138617755-192.168.76.8-50010-1255511405960
>>>> 2009-10-14 11:10:05,961 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.8:50010
>>>> 2009-10-14 11:10:06,154 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 192.168.76.4:50010 storage
>>>> DS-569548170-192.168.76.4-50010-1255511406153
>>>> 2009-10-14 11:10:06,154 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/192.168.76.4:50010
>>>> 2009-10-14 11:10:14,480 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 0
>>>> time(s).
>>>> 2009-10-14 11:10:35,483 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 1
>>>> time(s).
>>>> 2009-10-14 11:10:56,486 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 2
>>>> time(s).
>>>> 2009-10-14 11:11:17,489 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 3
>>>> time(s).
>>>> 2009-10-14 11:11:38,492 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 4
>>>> time(s).
>>>> 2009-10-14 11:11:59,495 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 5
>>>> time(s).
>>>> 2009-10-14 11:12:20,498 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 6
>>>> time(s).
>>>> 2009-10-14 11:12:41,501 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 7
>>>> time(s).
>>>> 2009-10-14 11:13:02,504 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 8
>>>> time(s).
>>>> 2009-10-14 11:13:23,506 INFO org.apache.hadoop.ipc.Client: Retrying
>>>> connect to server: cluma.gbif.clu/192.168.76.254:8020. Already tried 9
>>>> time(s).
>>>> etc.
>>>>
>>>
>>
>

Reply via email to