Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Jon Lederman
Todd,

I have attached the jstack pid of namenode output.  Does it appear to be 
stuck in SecureRandom as you noted as a possibility?  I am not sure whether 
this is indicated in the following output:

sh-4.1# jps
4038 JobTracker
4160 Jps
3917 DataNode
4121 TaskTracker
3844 NameNode
3992 SecondaryNameNode

sh-4.1# jstack 3844
2011-01-03 15:07:01
Full thread dump OpenJDK Zero VM (14.0-b16 interpreted mode):
 
Attach Listener daemon prio=10 tid=0x0021a870 nid=0x106e waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE
 
3299...@qtp0-1 prio=10 tid=0x6ff2cee8 nid=0x1039 in Object.wait() [0x6f2fe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x7dcb46a8 (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
- locked 0x7dcb46a8 (a org.mortbay.thread.QueuedThreadPool$PoolThread)
 
15020...@qtp0-0 prio=10 tid=0x6ff2ddd8 nid=0x1038 in Object.wait() 
[0x6f47e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x7dcb4718 (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
- locked 0x7dcb4718 (a org.mortbay.thread.QueuedThreadPool$PoolThread)
 
org.apache.hadoop.hdfs.server.namenode.decommissionmanager$moni...@955cd5 
daemon prio=10 tid=0x6ff036f8 nid=0xffe waiting on condition [0x6f68e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
at java.lang.Thread.run(Thread.java:636)
 
org.apache.hadoop.hdfs.server.namenode.fsnamesystem$replicationmoni...@25c828 
daemon prio=10 tid=0x6ff02230 nid=0xff9 waiting on condition [0x6f80e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2327)
at java.lang.Thread.run(Thread.java:636)
 
org.apache.hadoop.hdfs.server.namenode.leasemanager$moni...@22ab57 daemon 
prio=10 tid=0x6ff00e00 nid=0xff8 waiting on condition [0x6f98e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:350)
at java.lang.Thread.run(Thread.java:636)
 
org.apache.hadoop.hdfs.server.namenode.fsnamesystem$heartbeatmoni...@b1074a 
daemon prio=10 tid=0x6ff009b0 nid=0xff7 waiting on condition [0x6fb0e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2309)
at java.lang.Thread.run(Thread.java:636)
 
org.apache.hadoop.hdfs.server.namenode.pendingreplicationblocks$pendingreplicationmoni...@165f738
 daemon prio=10 tid=0x001f66e8 nid=0xff6 waiting on condition [0x6fc9e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186)
at java.lang.Thread.run(Thread.java:636)
 
Low Memory Detector daemon prio=10 tid=0x000c09a8 nid=0xf50 runnable 
[0x]
   java.lang.Thread.State: RUNNABLE
 
Signal Dispatcher daemon prio=10 tid=0x000bf1b8 nid=0xf4f runnable 
[0x]
   java.lang.Thread.State: RUNNABLE
 
Finalizer daemon prio=10 tid=0x000af298 nid=0xf48 in Object.wait() 
[0x7063e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x7daf8b40 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
- locked 0x7daf8b40 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
 
Reference Handler daemon prio=10 tid=0x000aaa08 nid=0xf47 in Object.wait() 
[0x707be000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x7daf8bc8 (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked 0x7daf8bc8 (a java.lang.ref.Reference$Lock)
 
main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:236)
at 

Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Ted Dunning
Yes.  It is stuck as suggested.  See the bolded lines.

You can help avoid this by dumping additional entropy into the machine via
network traffic.  According to the man page for /dev/random you can cheat by
writing goo into /dev/urandom, but I have been unable to verify that by
experiment.

Is it really necessary to use /dev/random here?  Again from the man page,
there is a strong feeling in the community that only very long lived, high
value keys really need to read from /dev/random.  Session keys from
/dev/urandom are fine.

I wrote an adaptation of the secure seed generator that doesn't block for
Mahout.  It is trivial, but might be useful to copy:
http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java



On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman jon2...@gmail.com wrote:

 I have attached the jstack pid of namenode output.  Does it appear to be
 stuck in SecureRandom as you noted as a possibility?  I am not sure whether
 this is indicated in the following output:

 ...

main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
   java.lang.Thread.State: RUNNABLE
 *at java.io.FileInputStream.readBytes(Native Method)
 *at java.io.FileInputStream.read(FileInputStream.java:236)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0x70e59ae8 (a java.io.BufferedInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0x70e59970 (a java.io.BufferedInputStream)
at
 sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
at
 sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
at
 sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
 *at
 sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
 *at
 sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)




Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Jon Lederman
Hi Ted,

Could you give me a bit more information on how I can overcome this issue.  I 
am running Hadoop on an embedded processor and networking is turned off to the 
embedded processor. Is there a quick way to check whether this is in fact 
blocking on my system?  And, are there some variables or configuration options 
I can set to avoid any potential blocking behavior?

Thanks.

-Jon
On Jan 3, 2011, at 3:48 PM, Ted Dunning wrote:

 Yes.  It is stuck as suggested.  See the bolded lines.
 
 You can help avoid this by dumping additional entropy into the machine via
 network traffic.  According to the man page for /dev/random you can cheat by
 writing goo into /dev/urandom, but I have been unable to verify that by
 experiment.
 
 Is it really necessary to use /dev/random here?  Again from the man page,
 there is a strong feeling in the community that only very long lived, high
 value keys really need to read from /dev/random.  Session keys from
 /dev/urandom are fine.
 
 I wrote an adaptation of the secure seed generator that doesn't block for
 Mahout.  It is trivial, but might be useful to copy:
 http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java
 
 
 
 On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman jon2...@gmail.com wrote:
 
 I have attached the jstack pid of namenode output.  Does it appear to be
 stuck in SecureRandom as you noted as a possibility?  I am not sure whether
 this is indicated in the following output:
 
 ...
 
 main prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
  java.lang.Thread.State: RUNNABLE
 *at java.io.FileInputStream.readBytes(Native Method)
 *at java.io.FileInputStream.read(FileInputStream.java:236)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - locked 0x70e59ae8 (a java.io.BufferedInputStream)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - locked 0x70e59970 (a java.io.BufferedInputStream)
   at
 sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
   at
 sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
   at
 sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
 *at
 sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
 *at
 sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
 
 



Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Ted Dunning
try

   dd if=/dev/random bs=1 count=100 of=/dev/null

This will likely hang for a long time.

There is no way that I know of to change the behavior of /dev/random except
by changing the file itself to point to a different minor device.  That
would be very bad form.

One think you may be able do is to pour lots of entropy into the system via
/dev/urandom.  I was not able to demonstrate this, though, when I just tried
that.  It would be nice if there were a config variable to set that would
change this behavior, but right now, a code change is required (AFAIK).

Another thing to do is replace the use of SecureRandom with a version that
uses /dev/urandom.  That is the point of the code that I linked to.  It
provides a plugin replacement that will not block.

On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote:


 Could you give me a bit more information on how I can overcome this issue.
  I am running Hadoop on an embedded processor and networking is turned off
 to the embedded processor. Is there a quick way to check whether this is in
 fact blocking on my system?  And, are there some variables or configuration
 options I can set to avoid any potential blocking behavior?




Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Jon Lederman
Thanks.  Will try that.  One final question, based on the jstack output I sent, 
is it obvious that the system is blocked due to the behavior of /dev/random?  
That is, can you enlighten me to the output I sent that explicitly or 
implicitly indicates the blocking?  I am trying to understand whether this is 
in fact the problem or whether there could be some other issue.  

If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee 
it will eventually return in some relatively finite period of time such as 
hours, or could it potentially take days, weeks, years or eternity?

Thanks in advance.

-Jon
On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote:

 try
 
   dd if=/dev/random bs=1 count=100 of=/dev/null
 
 This will likely hang for a long time.
 
 There is no way that I know of to change the behavior of /dev/random except
 by changing the file itself to point to a different minor device.  That
 would be very bad form.
 
 One think you may be able do is to pour lots of entropy into the system via
 /dev/urandom.  I was not able to demonstrate this, though, when I just tried
 that.  It would be nice if there were a config variable to set that would
 change this behavior, but right now, a code change is required (AFAIK).
 
 Another thing to do is replace the use of SecureRandom with a version that
 uses /dev/urandom.  That is the point of the code that I linked to.  It
 provides a plugin replacement that will not block.
 
 On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote:
 
 
 Could you give me a bit more information on how I can overcome this issue.
 I am running Hadoop on an embedded processor and networking is turned off
 to the embedded processor. Is there a quick way to check whether this is in
 fact blocking on my system?  And, are there some variables or configuration
 options I can set to avoid any potential blocking behavior?
 
 



Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Ted Dunning
On Mon, Jan 3, 2011 at 4:48 PM, Jon Lederman jon2...@gmail.com wrote:

 Thanks.  Will try that.  One final question, based on the jstack output I
 sent, is it obvious that the system is blocked due to the behavior of
 /dev/random?



I tried to send you a highlighted markup of your jstack output.

The key thing to look for is some thread reading bytes that nests from
SecureRandom.


 If I just let the FS command run (i.e., hadoop fs -ls), is there any
 guarantee it will eventually return in some relatively finite period of time
 such as hours, or could it potentially take days, weeks, years or eternity?


It depends on how quiet your machine is.  If it has stuff happening, then it
will unwedge eventually.


Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Konstantin Boudnik
Another possibility to fix it is to install rng-tools which will allow
you to increase the amount of entropy in your system.
--
  Take care,
Konstantin (Cos) Boudnik



On Mon, Jan 3, 2011 at 16:48, Jon Lederman jon2...@gmail.com wrote:
 Thanks.  Will try that.  One final question, based on the jstack output I 
 sent, is it obvious that the system is blocked due to the behavior of 
 /dev/random?  That is, can you enlighten me to the output I sent that 
 explicitly or implicitly indicates the blocking?  I am trying to understand 
 whether this is in fact the problem or whether there could be some other 
 issue.

 If I just let the FS command run (i.e., hadoop fs -ls), is there any 
 guarantee it will eventually return in some relatively finite period of time 
 such as hours, or could it potentially take days, weeks, years or eternity?

 Thanks in advance.

 -Jon
 On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote:

 try

   dd if=/dev/random bs=1 count=100 of=/dev/null

 This will likely hang for a long time.

 There is no way that I know of to change the behavior of /dev/random except
 by changing the file itself to point to a different minor device.  That
 would be very bad form.

 One think you may be able do is to pour lots of entropy into the system via
 /dev/urandom.  I was not able to demonstrate this, though, when I just tried
 that.  It would be nice if there were a config variable to set that would
 change this behavior, but right now, a code change is required (AFAIK).

 Another thing to do is replace the use of SecureRandom with a version that
 uses /dev/urandom.  That is the point of the code that I linked to.  It
 provides a plugin replacement that will not block.

 On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman jon2...@gmail.com wrote:


 Could you give me a bit more information on how I can overcome this issue.
 I am running Hadoop on an embedded processor and networking is turned off
 to the embedded processor. Is there a quick way to check whether this is in
 fact blocking on my system?  And, are there some variables or configuration
 options I can set to avoid any potential blocking behavior?






Re: HDFS FS Commands Hanging System

2011-01-02 Thread Black, Michael (IS)
Did you sert your config and format the namenode as per these instructions?
 
http://hadoop.apache.org/common/docs/current/single_node_setup.html
 
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 

 


Re: HDFS FS Commands Hanging System

2011-01-02 Thread Jon Lederman
Hi,

I followed the example precisely.  It seems to me that the NameNode and 
DataNode are not communicating.  I noticed that the log file for my DataNode 
appears suspiciously short.  I believe it should try to connect to the NameNode 
and report such progress.  The log for the DataNode simply shows:

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
/

Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as 
indicated in bold:

/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
/
2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=900
0
2011-01-02 16:30:35,093 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Namenode up at: localhost.localdomain/127.0.0.1:90
00
2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessi
onId=null
2011-01-02 16:30:35,196 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
NameNodeMeterics using
 context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-02 16:30:37,022 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
2011-01-02 16:30:37,029 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-01-02 16:30:37,032 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-01-02 16:30:37,216 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetric
s using context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-02 16:30:37,242 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files = 1
2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files under construction = 0
2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 94 loaded in 0 seconds.
2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Edits file /tmp/hadoop-root/dfs/name/current/edits of
 size 4 edits # 0 loaded in 0 seconds.
2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 94 saved in 0 seconds.
2011-01-02 16:30:38,104 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage 
in 1726 msecs
2011-01-02 16:30:38,130 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2011-01-02 16:30:38,133 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2011-01-02 16:30:38,136 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated 
blocks = 0
2011-01-02 16:30:38,139 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of  over-replicated 
blocks = 0
2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving 
safe mode after 1 secs.
2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network 
topology has 0 racks and 0 datanodes
2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE* 
UnderReplicatedBlocks has 0 blocks
2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.
Slf4jLog
2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() bef
ore open() is -1. Opening the listener on 50070
2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50070 webServer.getConnectors()
[0].getLocalPort() returned 50070
2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50070
2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14

What should I check to see whether there is communication?  Why should the 
network topology as reported by the Namenode indicate 0 racks and 0 Datanodes?

Also, I am curious what should be in the masters and slaves files when running 
in pseudo-distributed mode.

It seems I need to have both files contain: localhost.  Otherwise, the DataNode 
and/or NameNode do not start.

Any 

Re: HDFS FS Commands Hanging System

2011-01-02 Thread Esteban Gutierrez Moguel
Hello Jon,

Could you please verify that your node can resolve the host name?

It would be helpful too if you can attach your configuration files and the
output of:

HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

as Todd suggested.

Cheers,
esteban
On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote:
 Hi,

 Still no luck in getting FS commands to work. I did take a look at the
logs. They all look pretty clean with the following exceptions: The DataNode
appears to start up fine. However, the NameNode reports that the Network
Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
namenode cannot talk to the datanode? Any thoughts on what might be wrong?

 Thanks in advance and happy new year.

 -Jon
 2011-01-01 19:45:27,197 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
 2011-01-01 19:45:23,988 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=
 NameNode, port=8020
 2011-01-01 19:45:28,355 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
localhost.locald
 omain/127.0.0.1:8020
 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processNa
 me=NameNode, sessionId=null
 2011-01-01 19:45:28,492 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
Name
 NodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
 2011-01-01 19:45:29,758 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
 2011-01-01 19:45:29,763 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2011-01-01 19:45:29,770 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
 2011-01-01 19:45:29,965 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing
 FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
 2011-01-01 19:45:29,994 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatu
 sMBean
 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1
 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction
 = 0
 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 94 loaded in 0 s
 econds.
 2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop-root/dfs/nam
 e/current/edits of size 4 edits # 0 loaded in 0 seconds.
 2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 94 saved in 0 se
 conds.
 2011-01-01 19:45:30,924 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in
 1701 msecs
 2011-01-01 19:45:30,945 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 0
 2011-01-01 19:45:30,948 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
 2011-01-01 19:45:30,958 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated b
 locks = 0
 2011-01-01 19:45:30,963 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
over-replicated b
 locks = 0
 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 1 secs.
 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 0 racks and 0 dat
 anodes
 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
 a org.mortbay.log.Slf4jLog
 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].
 getLocalPort() before open() is -1. Opening the listener on 50070
 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070 

Re: HDFS FS Commands Hanging System

2011-01-02 Thread Jon Lederman
Hi Esteban,

Thanks.  Can you tell me how I can check whether my node can resolve the host 
name?  I don't know precisely how to do that.

When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
I get:

# HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.init(Configuration.java:211)
at org.apache.hadoop.conf.Configuration.init(Configuration.java:198)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)

11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root
11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root
11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms.
11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to 
localhost/127.0.0.1:9000 from root sending #0
11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to 
localhost/127.0.0.1:9000 from root: starting, having connections 1

Then the system hangs and does not return.  

My core-site.xml file is as follows:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
 property
 namefs.default.name/name
 valuehdfs://localhost:9000/value
 /property
/configuration


My hdfs-site.xml file is as follows:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
 property
 namedfs.replication/name
 value1/value
 /property
/configuration


My mapred-site.xml file is as follows:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
 /property
/configuration

My masters and slaves files both indicate: localhost

Thanks for your help.  I really appreciate this.

-Jon
On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:

 Hello Jon,
 
 Could you please verify that your node can resolve the host name?
 
 It would be helpful too if you can attach your configuration files and the
 output of:
 
 HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 
 as Todd suggested.
 
 Cheers,
 esteban
 On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote:
 Hi,
 
 Still no luck in getting FS commands to work. I did take a look at the
 logs. They all look pretty clean with the following exceptions: The DataNode
 appears to start up fine. However, the NameNode reports that the Network
 Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
 namenode cannot talk to the datanode? Any thoughts on what might be wrong?
 
 Thanks in advance and happy new year.
 
 -Jon
 2011-01-01 19:45:27,197 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
 2011-01-01 19:45:23,988 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=
 NameNode, port=8020
 2011-01-01 19:45:28,355 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 localhost.locald
 omain/127.0.0.1:8020
 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processNa
 me=NameNode, sessionId=null
 2011-01-01 19:45:28,492 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
 Name
 NodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-01-01 19:45:29,758 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
 2011-01-01 19:45:29,763 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2011-01-01 

Re: HDFS FS Commands Hanging System

2011-01-02 Thread Hari Sreekumar
Could this be a java/OS issue? Which java and OS are you using?

Hari

On Sunday, January 2, 2011, Jon Lederman jon2...@gmail.com wrote:
 Hi Esteban,

 Thanks.  Can you tell me how I can check whether my node can resolve the host 
 name?  I don't know precisely how to do that.

 When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 I get:

 # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
         at org.apache.hadoop.conf.Configuration.init(Configuration.java:211)
         at org.apache.hadoop.conf.Configuration.init(Configuration.java:198)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
         at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)

 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root
 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root
 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms.
 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to 
 localhost/127.0.0.1:9000 from root sending #0
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to 
 localhost/127.0.0.1:9000 from root: starting, having connections 1

 Then the system hangs and does not return.

 My core-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
      property
          namefs.default.name/name
          valuehdfs://localhost:9000/value
      /property
 /configuration


 My hdfs-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
      property
          namedfs.replication/name
          value1/value
      /property
 /configuration


 My mapred-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
      property
          namemapred.job.tracker/name
          valuelocalhost:9001/value
      /property
 /configuration

 My masters and slaves files both indicate: localhost

 Thanks for your help.  I really appreciate this.

 -Jon
 On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:

 Hello Jon,

 Could you please verify that your node can resolve the host name?

 It would be helpful too if you can attach your configuration files and the
 output of:

 HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

 as Todd suggested.

 Cheers,
 esteban
 On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote:
 Hi,

 Still no luck in getting FS commands to work. I did take a look at the
 logs. They all look pretty clean with the following exceptions: The DataNode
 appears to start up fine. However, the NameNode reports that the Network
 Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
 namenode cannot talk to the datanode? Any thoughts on what might be wrong?

 Thanks in advance and happy new year.

 -Jon
 2011-01-01 19:45:27,197 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
 2011-01-01 19:45:23,988 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG: host = localhost/127.0.0.1
 STARTUP_MSG: args = []
 STARTUP_MSG: version = 0.20.2
 STARTUP_MSG: build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=
 NameNode, port=8020
 2011-01-01 19:45:28,355 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 localhost.locald
 omain/127.0.0.1:8020
 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processNa
 me=NameNode, sessionId=null
 2011-01-01 19:45:28,492 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
 Name
 NodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-01-01 19:45:29,758 INFO
 

Re: HDFS FS Commands Hanging System

2011-01-02 Thread Harsh J
If you're using Java version 1.6.0_18, avoid it and switch to a more
recent release.
For information on why, check http://wiki.apache.org/hadoop/HadoopJavaVersions

Although I don't think that it could be the real reason behind the
issue here, it may be good to avoid that particular release before
progressing deeper.

On Fri, Dec 31, 2010 at 10:30 PM, Jon Lederman jon2...@gmail.com wrote:
 Hi All,

 I have been working on running Hadoop on a new microprocessor architecture in 
 pseudo-distributed mode.  I have been successful in getting SSH configured.  
 I am also able to start a namenode, secondary namenode, tasktracker, 
 jobtracker and datanode as evidenced by the response I get from jps.

 However, when I attempt to interact with the file system in any way such as 
 the simple command hadoop fs -ls, the system hangs.  So it appears to me that 
 some communication is not occurring properly.  Does anyone have any 
 suggestions what I look into in order to fix this problem?

 Thanks in advance.

 -Jon



-- 
Harsh J
www.harshj.com


Re: HDFS FS Commands Hanging System

2011-01-02 Thread Esteban Gutierrez Moguel
Hi Jon,

I was able to reproduce your error by shutting down HDFS and setting up nc
to listen connections in the same port (9000).

Could you please verify that the port 9000 is being used by the right
process (NameNode)

PIDs for fuser -n tcp 9000 and jps | grep NameNode should be the same.

esteban.


On Sun, Jan 2, 2011 at 10:56, Jon Lederman jon2...@gmail.com wrote:

 Hi Esteban,

 Thanks.  Can you tell me how I can check whether my node can resolve the
 host name?  I don't know precisely how to do that.

 When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 I get:

 # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
at
 org.apache.hadoop.conf.Configuration.init(Configuration.java:211)
at
 org.apache.hadoop.conf.Configuration.init(Configuration.java:198)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)

 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login:
 root,root
 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login:
 root,root
 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms.
 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9000 from root sending #0
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9000 from root: starting, having connections 1

 Then the system hangs and does not return.

 My core-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namefs.default.name/name
 valuehdfs://localhost:9000/value
 /property
 /configuration


 My hdfs-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 /configuration


 My mapred-site.xml file is as follows:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
 /property
 /configuration

 My masters and slaves files both indicate: localhost

 Thanks for your help.  I really appreciate this.

 -Jon
 On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:

  Hello Jon,
 
  Could you please verify that your node can resolve the host name?
 
  It would be helpful too if you can attach your configuration files and
 the
  output of:
 
  HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 
  as Todd suggested.
 
  Cheers,
  esteban
  On Jan 1, 2011 2:01 PM, Jon Lederman jon2...@gmail.com wrote:
  Hi,
 
  Still no luck in getting FS commands to work. I did take a look at the
  logs. They all look pretty clean with the following exceptions: The
 DataNode
  appears to start up fine. However, the NameNode reports that the Network
  Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
  namenode cannot talk to the datanode? Any thoughts on what might be
 wrong?
 
  Thanks in advance and happy new year.
 
  -Jon
  2011-01-01 19:45:27,197 INFO
  org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
  /
  STARTUP_MSG: Starting DataNode
  STARTUP_MSG: host = localhost/127.0.0.1
  STARTUP_MSG: args = []
  STARTUP_MSG: version = 0.20.2
  STARTUP_MSG: build =
  https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
  911707; compiled
  by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
  /
  sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
  2011-01-01 19:45:23,988 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
  /
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG: host = localhost/127.0.0.1
  STARTUP_MSG: args = []
  STARTUP_MSG: version = 0.20.2
  STARTUP_MSG: build =
  https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
  911707; compiled
  by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
  /
  2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
  Initializing RPC Metrics with hostName=
  NameNode, port=8020
  2011-01-01 19:45:28,355 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
  localhost.locald
  omain/127.0.0.1:8020
  2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
  Initializing JVM 

Re: HDFS FS Commands Hanging System

2011-01-02 Thread Jon Lederman
Hi Esteban,

Thanks for your response.

I don't have the fuser executable installed on the environment I am running on.

However, I do find the following:

# jps
923 JobTracker
870 SecondaryNameNode
1188 Jps
794 DataNode
996 TaskTracker
727 NameNode
# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
tcp0  0 (null):sunrpc   (null):*LISTEN  
tcp0  0 (null):ssh  (null):*LISTEN  
tcp2  0 localhost.localdomain:9000 :::*LISTEN   
   
tcp0  0 localhost.localdomain:9001 :::*LISTEN   
   
tcp0  0 ::%989480:50060 :::*LISTEN  
tcp0  0 ::%989704:50030 :::*LISTEN  
tcp0  0 ::%989480:50070 :::*LISTEN  
tcp0  0 ::%989480:telnet:::*LISTEN  
udp0  0 (null):sunrpc   (null):*
Active UNIX domain sockets (only servers)
Proto RefCnt Flags   Type   State I-Node Path
unix  2  [ ACC ] STREAM LISTENING   1281 @MONITOR_617_1
# 

So, all of the daemons are running.  Please note the following out of my log 
files:

When I look at the log files, the NameNode on startup indicates:
Network topology has 0 racks and 0 datanodes
Also, my DataNode startup log is suspiciously short indicating only.  
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
/
There is no attempt from the DataNode to communicate or otherwise establish 
communication with the NameNode.  It appears to me that the NameNode and 
DataNode aren't communicating, which may be the source of my problem.  However, 
i don't know why this would be or how I can debug it since I am not sure of the 
internal operation of Hadoop.

Any thoughts on all of this.  Thanks in advance.

-Jon


On Jan 2, 2011, at 2:05 PM, Esteban Gutierrez Moguel wrote:

 Hi Jon,
 
 I was able to reproduce your error by shutting down HDFS and setting up nc
 to listen connections in the same port (9000).
 
 Could you please verify that the port 9000 is being used by the right
 process (NameNode)
 
 PIDs for fuser -n tcp 9000 and jps | grep NameNode should be the same.
 
 esteban.
 
 
 On Sun, Jan 2, 2011 at 10:56, Jon Lederman jon2...@gmail.com wrote:
 
 Hi Esteban,
 
 Thanks.  Can you tell me how I can check whether my node can resolve the
 host name?  I don't know precisely how to do that.
 
 When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 I get:
 
 # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
   at
 org.apache.hadoop.conf.Configuration.init(Configuration.java:211)
   at
 org.apache.hadoop.conf.Configuration.init(Configuration.java:198)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)
 
 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login:
 root,root
 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login:
 root,root
 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is6ms.
 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9000 from root sending #0
 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9000 from root: starting, having connections 1
 
 Then the system hangs and does not return.
 
 My core-site.xml file is as follows:
 
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
 !-- Put site-specific property overrides in this file. --
 
 configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
/property
 /configuration
 
 
 My hdfs-site.xml file is as follows:
 
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
 !-- Put site-specific property overrides in this file. --
 
 configuration
property
namedfs.replication/name
value1/value
/property
 /configuration
 
 
 My mapred-site.xml file is as follows:
 
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
 !-- Put site-specific property overrides in this file. --
 
 configuration
property
namemapred.job.tracker/name

Re: HDFS FS Commands Hanging System

2011-01-01 Thread Jon Lederman
Hi,

Still no luck in getting FS commands to work.  I did take a look at the logs.  
They all look pretty clean with the following exceptions: The DataNode appears 
to start up fine.  However, the NameNode reports that the Network Topology has 
0 racks and 0 datanodes.  Is this normal?  Is it possible the namenode cannot 
talk to the datanode?  Any thoughts on what might be wrong?

Thanks in advance and happy new year.

-Jon
2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; 
compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
/
2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=
NameNode, port=8020
2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Namenode up at: localhost.locald
omain/127.0.0.1:8020
2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processNa
me=NameNode, sessionId=null
2011-01-01 19:45:28,492 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
Name
NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-01 19:45:29,758 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
2011-01-01 19:45:29,763 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-01-01 19:45:29,770 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-01-01 19:45:29,965 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing 
FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2011-01-01 19:45:29,994 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatu
sMBean
2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files = 1
2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Number of files under construction 
= 0
2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 94 loaded in 0 s
econds.
2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Edits file /tmp/hadoop-root/dfs/nam
e/current/edits of size 4 edits # 0 loaded in 0 seconds.
2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Image file of size 94 saved in 0 se
conds.
2011-01-01 19:45:30,924 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage 
in 
1701 msecs
2011-01-01 19:45:30,945 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2011-01-01 19:45:30,948 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2011-01-01 19:45:30,958 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated 
b
locks = 0
2011-01-01 19:45:30,963 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of  over-replicated 
b
locks = 0
2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving 
safe mode after 1 secs.
2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network 
topology has 0 racks and 0 dat
anodes
2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE* 
UnderReplicatedBlocks has 0 blocks
2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
a org.mortbay.log.Slf4jLog
2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].
getLocalPort() before open() is -1. Opening the listener on 50070
2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50070 webServ
er.getConnectors()[0].getLocalPort() returned 50070
2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50070
2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
sc-ssh-svr1 logs $ 

On Dec 31, 2010, at 4:28 PM, li ping wrote:

 I suggest you should look 

HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi All,

I have been working on running Hadoop on a new microprocessor architecture in 
pseudo-distributed mode.  I have been successful in getting SSH configured.  I 
am also able to start a namenode, secondary namenode, tasktracker, jobtracker 
and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the 
simple command hadoop fs -ls, the system hangs.  So it appears to me that some 
communication is not occurring properly.  Does anyone have any suggestions what 
I look into in order to fix this problem?

Thanks in advance.

-Jon

Re: HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi Michael,

Thanks for your response.  It doesn't seem to be an issue with safemode.

Even when I try the command dfsadmin -safemode get, the system hangs.  I am 
unable to execute any FS shell commands other than hadoop fs -help.

I am wondering whether this an issue with communication between the daemons?  
What should I be looking at there?  Or could it be something else?

When I do jps, I do see all the daemons listed.

Any other thoughts.

Thanks again and happy new year.

-Jon
On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

 Try checking your dfs status
 
 hadoop dfsadmin -safemode get
 
 Probably says ON
 
 hadoop dfsadmin -safemode leave
 
 Somebody else can probably say how to make this happen every reboot
 
 Michael D. Black
 Senior Scientist
 Advanced Analytics Directorate
 Northrop Grumman Information Systems
 
 
 
 
 From: Jon Lederman [mailto:jon2...@gmail.com]
 Sent: Fri 12/31/2010 11:00 AM
 To: common-user@hadoop.apache.org
 Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
 Hi All,
 
 I have been working on running Hadoop on a new microprocessor architecture in 
 pseudo-distributed mode.  I have been successful in getting SSH configured.  
 I am also able to start a namenode, secondary namenode, tasktracker, 
 jobtracker and datanode as evidenced by the response I get from jps.
 
 However, when I attempt to interact with the file system in any way such as 
 the simple command hadoop fs -ls, the system hangs.  So it appears to me that 
 some communication is not occurring properly.  Does anyone have any 
 suggestions what I look into in order to fix this problem?
 
 Thanks in advance.
 
 -Jon 
 



Re: HDFS FS Commands Hanging System

2010-12-31 Thread Todd Lipcon
Hi Jon,

Try:
HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

-Todd

On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman jon2...@gmail.com wrote:

 Hi Michael,

 Thanks for your response.  It doesn't seem to be an issue with safemode.

 Even when I try the command dfsadmin -safemode get, the system hangs.  I am
 unable to execute any FS shell commands other than hadoop fs -help.

 I am wondering whether this an issue with communication between the
 daemons?  What should I be looking at there?  Or could it be something else?

 When I do jps, I do see all the daemons listed.

 Any other thoughts.

 Thanks again and happy new year.

 -Jon
 On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

  Try checking your dfs status
 
  hadoop dfsadmin -safemode get
 
  Probably says ON
 
  hadoop dfsadmin -safemode leave
 
  Somebody else can probably say how to make this happen every reboot
 
  Michael D. Black
  Senior Scientist
  Advanced Analytics Directorate
  Northrop Grumman Information Systems
 
 
  
 
  From: Jon Lederman [mailto:jon2...@gmail.com]
  Sent: Fri 12/31/2010 11:00 AM
  To: common-user@hadoop.apache.org
  Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
  Hi All,
 
  I have been working on running Hadoop on a new microprocessor
 architecture in pseudo-distributed mode.  I have been successful in getting
 SSH configured.  I am also able to start a namenode, secondary namenode,
 tasktracker, jobtracker and datanode as evidenced by the response I get from
 jps.
 
  However, when I attempt to interact with the file system in any way such
 as the simple command hadoop fs -ls, the system hangs.  So it appears to me
 that some communication is not occurring properly.  Does anyone have any
 suggestions what I look into in order to fix this problem?
 
  Thanks in advance.
 
  -Jon
 




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS FS Commands Hanging System

2010-12-31 Thread li ping
I suggest you should look through the logs to see if there is any error.
And the second point that I need to point out is which node you run the
command hadoop fs -ls . If you run the command on Node A, the
configuration item fs.default.name should point to the HDFS.

On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman jon2...@gmail.com wrote:

 Hi Michael,

 Thanks for your response.  It doesn't seem to be an issue with safemode.

 Even when I try the command dfsadmin -safemode get, the system hangs.  I am
 unable to execute any FS shell commands other than hadoop fs -help.

 I am wondering whether this an issue with communication between the
 daemons?  What should I be looking at there?  Or could it be something else?

 When I do jps, I do see all the daemons listed.

 Any other thoughts.

 Thanks again and happy new year.

 -Jon
 On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

  Try checking your dfs status
 
  hadoop dfsadmin -safemode get
 
  Probably says ON
 
  hadoop dfsadmin -safemode leave
 
  Somebody else can probably say how to make this happen every reboot
 
  Michael D. Black
  Senior Scientist
  Advanced Analytics Directorate
  Northrop Grumman Information Systems
 
 
  
 
  From: Jon Lederman [mailto:jon2...@gmail.com]
  Sent: Fri 12/31/2010 11:00 AM
  To: common-user@hadoop.apache.org
  Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
  Hi All,
 
  I have been working on running Hadoop on a new microprocessor
 architecture in pseudo-distributed mode.  I have been successful in getting
 SSH configured.  I am also able to start a namenode, secondary namenode,
 tasktracker, jobtracker and datanode as evidenced by the response I get from
 jps.
 
  However, when I attempt to interact with the file system in any way such
 as the simple command hadoop fs -ls, the system hangs.  So it appears to me
 that some communication is not occurring properly.  Does anyone have any
 suggestions what I look into in order to fix this problem?
 
  Thanks in advance.
 
  -Jon
 




-- 
-李平