Re: Entropy Pool and HDFS FS Commands Hanging System
Todd, I have attached the jstack pid of namenode output. Does it appear to be stuck in SecureRandom as you noted as a possibility? I am not sure whether this is indicated in the following output: sh-4.1# jps 4038 JobTracker 4160 Jps 3917 DataNode 4121 TaskTracker 3844 NameNode 3992 SecondaryNameNode sh-4.1# jstack 3844 2011-01-03 15:07:01 Full thread dump OpenJDK Zero VM (14.0-b16 interpreted mode): Attach Listener daemon prio=10 tid=0x0021a870 nid=0x106e waiting on condition [0x] java.lang.Thread.State: RUNNABLE 3299...@qtp0-1 prio=10 tid=0x6ff2cee8 nid=0x1039 in Object.wait() [0x6f2fe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7dcb46a8 (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565) - locked 0x7dcb46a8 (a org.mortbay.thread.QueuedThreadPool$PoolThread) 15020...@qtp0-0 prio=10 tid=0x6ff2ddd8 nid=0x1038 in Object.wait() [0x6f47e000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7dcb4718 (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565) - locked 0x7dcb4718 (a org.mortbay.thread.QueuedThreadPool$PoolThread) org.apache.hadoop.hdfs.server.namenode.decommissionmanager$moni...@955cd5 daemon prio=10 tid=0x6ff036f8 nid=0xffe waiting on condition [0x6f68e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65) at java.lang.Thread.run(Thread.java:636) org.apache.hadoop.hdfs.server.namenode.fsnamesystem$replicationmoni...@25c828 daemon prio=10 tid=0x6ff02230 nid=0xff9 waiting on condition [0x6f80e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2327) at java.lang.Thread.run(Thread.java:636) org.apache.hadoop.hdfs.server.namenode.leasemanager$moni...@22ab57 daemon prio=10 tid=0x6ff00e00 nid=0xff8 waiting on condition [0x6f98e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:350) at java.lang.Thread.run(Thread.java:636) org.apache.hadoop.hdfs.server.namenode.fsnamesystem$heartbeatmoni...@b1074a daemon prio=10 tid=0x6ff009b0 nid=0xff7 waiting on condition [0x6fb0e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2309) at java.lang.Thread.run(Thread.java:636) org.apache.hadoop.hdfs.server.namenode.pendingreplicationblocks$pendingreplicationmoni...@165f738 daemon prio=10 tid=0x001f66e8 nid=0xff6 waiting on condition [0x6fc9e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186) at java.lang.Thread.run(Thread.java:636) Low Memory Detector daemon prio=10 tid=0x000c09a8 nid=0xf50 runnable [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x000bf1b8 nid=0xf4f runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x000af298 nid=0xf48 in Object.wait() [0x7063e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7daf8b40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133) - locked 0x7daf8b40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177) Reference Handler daemon prio=10 tid=0x000aaa08 nid=0xf47 in Object.wait() [0x707be000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x7daf8bc8 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked 0x7daf8bc8 (a java.lang.ref.Reference$Lock) main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:236) at
Re: Entropy Pool and HDFS FS Commands Hanging System
Yes. It is stuck as suggested. See the bolded lines. You can help avoid this by dumping additional entropy into the machine via network traffic. According to the man page for /dev/random you can cheat by writing goo into /dev/urandom, but I have been unable to verify that by experiment. Is it really necessary to use /dev/random here? Again from the man page, there is a strong feeling in the community that only very long lived, high value keys really need to read from /dev/random. Session keys from /dev/urandom are fine. I wrote an adaptation of the secure seed generator that doesn't block for Mahout. It is trivial, but might be useful to copy: http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman jon2...@gmail.com wrote: I have attached the jstack pid of namenode output. Does it appear to be stuck in SecureRandom as you noted as a possibility? I am not sure whether this is indicated in the following output: ... main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] java.lang.Thread.State: RUNNABLE *at java.io.FileInputStream.readBytes(Native Method) *at java.io.FileInputStream.read(FileInputStream.java:236) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x70e59ae8 (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x70e59970 (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) *at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) *at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
Re: Entropy Pool and HDFS FS Commands Hanging System
Hi Ted, Could you give me a bit more information on how I can overcome this issue. I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system? And, are there some variables or configuration options I can set to avoid any potential blocking behavior? Thanks. -Jon On Jan 3, 2011, at 3:48 PM, Ted Dunning wrote: Yes. It is stuck as suggested. See the bolded lines. You can help avoid this by dumping additional entropy into the machine via network traffic. According to the man page for /dev/random you can cheat by writing goo into /dev/urandom, but I have been unable to verify that by experiment. Is it really necessary to use /dev/random here? Again from the man page, there is a strong feeling in the community that only very long lived, high value keys really need to read from /dev/random. Session keys from /dev/urandom are fine. I wrote an adaptation of the secure seed generator that doesn't block for Mahout. It is trivial, but might be useful to copy: http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman jon2...@gmail.com wrote: I have attached the jstack pid of namenode output. Does it appear to be stuck in SecureRandom as you noted as a possibility? I am not sure whether this is indicated in the following output: ... main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000] java.lang.Thread.State: RUNNABLE *at java.io.FileInputStream.readBytes(Native Method) *at java.io.FileInputStream.read(FileInputStream.java:236) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x70e59ae8 (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0x70e59970 (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) *at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) *at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
Re: Entropy Pool and HDFS FS Commands Hanging System
try dd if=/dev/random bs=1 count=100 of=/dev/null This will likely hang for a long time. There is no way that I know of to change the behavior of /dev/random except by changing the file itself to point to a different minor device. That would be very bad form. One think you may be able do is to pour lots of entropy into the system via /dev/urandom. I was not able to demonstrate this, though, when I just tried that. It would be nice if there were a config variable to set that would change this behavior, but right now, a code change is required (AFAIK). Another thing to do is replace the use of SecureRandom with a version that uses /dev/urandom. That is the point of the code that I linked to. It provides a plugin replacement that will not block. On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote: Could you give me a bit more information on how I can overcome this issue. I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system? And, are there some variables or configuration options I can set to avoid any potential blocking behavior?
Re: Entropy Pool and HDFS FS Commands Hanging System
Thanks. Will try that. One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random? That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking? I am trying to understand whether this is in fact the problem or whether there could be some other issue. If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity? Thanks in advance. -Jon On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote: try dd if=/dev/random bs=1 count=100 of=/dev/null This will likely hang for a long time. There is no way that I know of to change the behavior of /dev/random except by changing the file itself to point to a different minor device. That would be very bad form. One think you may be able do is to pour lots of entropy into the system via /dev/urandom. I was not able to demonstrate this, though, when I just tried that. It would be nice if there were a config variable to set that would change this behavior, but right now, a code change is required (AFAIK). Another thing to do is replace the use of SecureRandom with a version that uses /dev/urandom. That is the point of the code that I linked to. It provides a plugin replacement that will not block. On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote: Could you give me a bit more information on how I can overcome this issue. I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system? And, are there some variables or configuration options I can set to avoid any potential blocking behavior?
Re: Entropy Pool and HDFS FS Commands Hanging System
On Mon, Jan 3, 2011 at 4:48 PM, Jon Lederman jon2...@gmail.com wrote: Thanks. Will try that. One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random? I tried to send you a highlighted markup of your jstack output. The key thing to look for is some thread reading bytes that nests from SecureRandom. If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity? It depends on how quiet your machine is. If it has stuff happening, then it will unwedge eventually.
Re: Entropy Pool and HDFS FS Commands Hanging System
Another possibility to fix it is to install rng-tools which will allow you to increase the amount of entropy in your system. -- Take care, Konstantin (Cos) Boudnik On Mon, Jan 3, 2011 at 16:48, Jon Lederman jon2...@gmail.com wrote: Thanks. Will try that. One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random? That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking? I am trying to understand whether this is in fact the problem or whether there could be some other issue. If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity? Thanks in advance. -Jon On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote: try dd if=/dev/random bs=1 count=100 of=/dev/null This will likely hang for a long time. There is no way that I know of to change the behavior of /dev/random except by changing the file itself to point to a different minor device. That would be very bad form. One think you may be able do is to pour lots of entropy into the system via /dev/urandom. I was not able to demonstrate this, though, when I just tried that. It would be nice if there were a config variable to set that would change this behavior, but right now, a code change is required (AFAIK). Another thing to do is replace the use of SecureRandom with a version that uses /dev/urandom. That is the point of the code that I linked to. It provides a plugin replacement that will not block. On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote: Could you give me a bit more information on how I can overcome this issue. I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system? And, are there some variables or configuration options I can set to avoid any potential blocking behavior?
Re: HDFS FS Commands Hanging System
Did you sert your config and format the namenode as per these instructions? http://hadoop.apache.org/common/docs/current/single_node_setup.html Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems
Re: HDFS FS Commands Hanging System
Hi, I followed the example precisely. It seems to me that the NameNode and DataNode are not communicating. I noticed that the log file for my DataNode appears suspiciously short. I believe it should try to connect to the NameNode and report such progress. The log for the DataNode simply shows: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F ri Feb 19 08:07:34 UTC 2010 / Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as indicated in bold: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F ri Feb 19 08:07:34 UTC 2010 / 2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=900 0 2011-01-02 16:30:35,093 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.localdomain/127.0.0.1:90 00 2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessi onId=null 2011-01-02 16:30:35,196 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-02 16:30:37,022 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root 2011-01-02 16:30:37,029 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-01-02 16:30:37,032 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-01-02 16:30:37,216 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetric s using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-02 16:30:37,242 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 seconds. 2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds. 2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 seconds. 2011-01-02 16:30:38,104 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1726 msecs 2011-01-02 16:30:38,130 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2011-01-02 16:30:38,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2011-01-02 16:30:38,136 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0 2011-01-02 16:30:38,139 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0 2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 1 secs. 2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log. Slf4jLog 2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() bef ore open() is -1. Opening the listener on 50070 2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors() [0].getLocalPort() returned 50070 2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14 What should I check to see whether there is communication? Why should the network topology as reported by the Namenode indicate 0 racks and 0 Datanodes? Also, I am curious what should be in the masters and slaves files when running in pseudo-distributed mode. It seems I need to have both files contain: localhost. Otherwise, the DataNode and/or NameNode do not start. Any
Re: HDFS FS Commands Hanging System
Hello Jon, Could you please verify that your node can resolve the host name? It would be helpful too if you can attach your configuration files and the output of: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / as Todd suggested. Cheers, esteban On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote: Hi, Still no luck in getting FS commands to work. I did take a look at the logs. They all look pretty clean with the following exceptions: The DataNode appears to start up fine. However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the namenode cannot talk to the datanode? Any thoughts on what might be wrong? Thanks in advance and happy new year. -Jon 2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log 2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName= NameNode, port=8020 2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald omain/127.0.0.1:8020 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processNa me=NameNode, sessionId=null 2011-01-01 19:45:28,492 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing Name NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,758 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root 2011-01-01 19:45:29,763 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-01-01 19:45:29,770 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-01-01 19:45:29,965 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,994 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatu sMBean 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 s econds. 2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/nam e/current/edits of size 4 edits # 0 loaded in 0 seconds. 2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 se conds. 2011-01-01 19:45:30,924 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1701 msecs 2011-01-01 19:45:30,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2011-01-01 19:45:30,948 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2011-01-01 19:45:30,958 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated b locks = 0 2011-01-01 19:45:30,963 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated b locks = 0 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 1 secs. 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 dat anodes 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi a org.mortbay.log.Slf4jLog 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0]. getLocalPort() before open() is -1. Opening the listener on 50070 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070
Re: HDFS FS Commands Hanging System
Hi Esteban, Thanks. Can you tell me how I can check whether my node can resolve the host name? I don't know precisely how to do that. When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / I get: # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.init(Configuration.java:211) at org.apache.hadoop.conf.Configuration.init(Configuration.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880) 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms. 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1 Then the system hangs and does not return. My core-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration My hdfs-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value /property /configuration My mapred-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelocalhost:9001/value /property /configuration My masters and slaves files both indicate: localhost Thanks for your help. I really appreciate this. -Jon On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote: Hello Jon, Could you please verify that your node can resolve the host name? It would be helpful too if you can attach your configuration files and the output of: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / as Todd suggested. Cheers, esteban On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote: Hi, Still no luck in getting FS commands to work. I did take a look at the logs. They all look pretty clean with the following exceptions: The DataNode appears to start up fine. However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the namenode cannot talk to the datanode? Any thoughts on what might be wrong? Thanks in advance and happy new year. -Jon 2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log 2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName= NameNode, port=8020 2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald omain/127.0.0.1:8020 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processNa me=NameNode, sessionId=null 2011-01-01 19:45:28,492 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing Name NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,758 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root 2011-01-01 19:45:29,763 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-01-01
Re: HDFS FS Commands Hanging System
Could this be a java/OS issue? Which java and OS are you using? Hari On Sunday, January 2, 2011, Jon Lederman jon2...@gmail.com wrote: Hi Esteban, Thanks. Can you tell me how I can check whether my node can resolve the host name? I don't know precisely how to do that. When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / I get: # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.init(Configuration.java:211) at org.apache.hadoop.conf.Configuration.init(Configuration.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880) 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms. 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1 Then the system hangs and does not return. My core-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration My hdfs-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value /property /configuration My mapred-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelocalhost:9001/value /property /configuration My masters and slaves files both indicate: localhost Thanks for your help. I really appreciate this. -Jon On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote: Hello Jon, Could you please verify that your node can resolve the host name? It would be helpful too if you can attach your configuration files and the output of: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / as Todd suggested. Cheers, esteban On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote: Hi, Still no luck in getting FS commands to work. I did take a look at the logs. They all look pretty clean with the following exceptions: The DataNode appears to start up fine. However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the namenode cannot talk to the datanode? Any thoughts on what might be wrong? Thanks in advance and happy new year. -Jon 2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log 2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName= NameNode, port=8020 2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald omain/127.0.0.1:8020 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processNa me=NameNode, sessionId=null 2011-01-01 19:45:28,492 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing Name NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,758 INFO
Re: HDFS FS Commands Hanging System
If you're using Java version 1.6.0_18, avoid it and switch to a more recent release. For information on why, check http://wiki.apache.org/hadoop/HadoopJavaVersions Although I don't think that it could be the real reason behind the issue here, it may be good to avoid that particular release before progressing deeper. On Fri, Dec 31, 2010 at 10:30 PM, Jon Lederman jon2...@gmail.com wrote: Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- Harsh J www.harshj.com
Re: HDFS FS Commands Hanging System
Hi Jon, I was able to reproduce your error by shutting down HDFS and setting up nc to listen connections in the same port (9000). Could you please verify that the port 9000 is being used by the right process (NameNode) PIDs for fuser -n tcp 9000 and jps | grep NameNode should be the same. esteban. On Sun, Jan 2, 2011 at 10:56, Jon Lederman jon2...@gmail.com wrote: Hi Esteban, Thanks. Can you tell me how I can check whether my node can resolve the host name? I don't know precisely how to do that. When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / I get: # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.init(Configuration.java:211) at org.apache.hadoop.conf.Configuration.init(Configuration.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880) 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms. 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1 Then the system hangs and does not return. My core-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration My hdfs-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value /property /configuration My mapred-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelocalhost:9001/value /property /configuration My masters and slaves files both indicate: localhost Thanks for your help. I really appreciate this. -Jon On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote: Hello Jon, Could you please verify that your node can resolve the host name? It would be helpful too if you can attach your configuration files and the output of: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / as Todd suggested. Cheers, esteban On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote: Hi, Still no luck in getting FS commands to work. I did take a look at the logs. They all look pretty clean with the following exceptions: The DataNode appears to start up fine. However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the namenode cannot talk to the datanode? Any thoughts on what might be wrong? Thanks in advance and happy new year. -Jon 2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log 2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName= NameNode, port=8020 2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald omain/127.0.0.1:8020 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Re: HDFS FS Commands Hanging System
Hi Esteban, Thanks for your response. I don't have the fuser executable installed on the environment I am running on. However, I do find the following: # jps 923 JobTracker 870 SecondaryNameNode 1188 Jps 794 DataNode 996 TaskTracker 727 NameNode # netstat -l Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp0 0 (null):sunrpc (null):*LISTEN tcp0 0 (null):ssh (null):*LISTEN tcp2 0 localhost.localdomain:9000 :::*LISTEN tcp0 0 localhost.localdomain:9001 :::*LISTEN tcp0 0 ::%989480:50060 :::*LISTEN tcp0 0 ::%989704:50030 :::*LISTEN tcp0 0 ::%989480:50070 :::*LISTEN tcp0 0 ::%989480:telnet:::*LISTEN udp0 0 (null):sunrpc (null):* Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node Path unix 2 [ ACC ] STREAM LISTENING 1281 @MONITOR_617_1 # So, all of the daemons are running. Please note the following out of my log files: When I look at the log files, the NameNode on startup indicates: Network topology has 0 racks and 0 datanodes Also, my DataNode startup log is suspiciously short indicating only. / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F ri Feb 19 08:07:34 UTC 2010 / There is no attempt from the DataNode to communicate or otherwise establish communication with the NameNode. It appears to me that the NameNode and DataNode aren't communicating, which may be the source of my problem. However, i don't know why this would be or how I can debug it since I am not sure of the internal operation of Hadoop. Any thoughts on all of this. Thanks in advance. -Jon On Jan 2, 2011, at 2:05 PM, Esteban Gutierrez Moguel wrote: Hi Jon, I was able to reproduce your error by shutting down HDFS and setting up nc to listen connections in the same port (9000). Could you please verify that the port 9000 is being used by the right process (NameNode) PIDs for fuser -n tcp 9000 and jps | grep NameNode should be the same. esteban. On Sun, Jan 2, 2011 at 10:56, Jon Lederman jon2...@gmail.com wrote: Hi Esteban, Thanks. Can you tell me how I can check whether my node can resolve the host name? I don't know precisely how to do that. When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / I get: # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.init(Configuration.java:211) at org.apache.hadoop.conf.Configuration.init(Configuration.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880) 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms. 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1 Then the system hangs and does not return. My core-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration My hdfs-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value /property /configuration My mapred-site.xml file is as follows: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name
Re: HDFS FS Commands Hanging System
Hi, Still no luck in getting FS commands to work. I did take a look at the logs. They all look pretty clean with the following exceptions: The DataNode appears to start up fine. However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the namenode cannot talk to the datanode? Any thoughts on what might be wrong? Thanks in advance and happy new year. -Jon 2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log 2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName= NameNode, port=8020 2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald omain/127.0.0.1:8020 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processNa me=NameNode, sessionId=null 2011-01-01 19:45:28,492 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing Name NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,758 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root 2011-01-01 19:45:29,763 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-01-01 19:45:29,770 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-01-01 19:45:29,965 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-01-01 19:45:29,994 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatu sMBean 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 s econds. 2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/nam e/current/edits of size 4 edits # 0 loaded in 0 seconds. 2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 se conds. 2011-01-01 19:45:30,924 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1701 msecs 2011-01-01 19:45:30,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0 2011-01-01 19:45:30,948 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0 2011-01-01 19:45:30,958 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated b locks = 0 2011-01-01 19:45:30,963 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated b locks = 0 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 1 secs. 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 dat anodes 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi a org.mortbay.log.Slf4jLog 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0]. getLocalPort() before open() is -1. Opening the listener on 50070 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServ er.getConnectors()[0].getLocalPort() returned 50070 2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14 sc-ssh-svr1 logs $ On Dec 31, 2010, at 4:28 PM, li ping wrote: I suggest you should look
HDFS FS Commands Hanging System
Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon
Re: HDFS FS Commands Hanging System
Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon
Re: HDFS FS Commands Hanging System
Hi Jon, Try: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / -Todd On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman jon2...@gmail.com wrote: Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS FS Commands Hanging System
I suggest you should look through the logs to see if there is any error. And the second point that I need to point out is which node you run the command hadoop fs -ls . If you run the command on Node A, the configuration item fs.default.name should point to the HDFS. On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman jon2...@gmail.com wrote: Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- -李平