Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Harsh J harsh@... writes: Hello RX, Could you paste your DFS configuration and the DN end-to-end log into a mail/pastebin-link? On Fri, May 27, 2011 at 5:31 AM, Xu, Richard richard.xu@... wrote: Hi Folks, We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now. Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release. It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version. 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs: 2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesys tem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2. Also, Configured Capacity is 0, cannot put any file to HDFS. 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing: 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting . 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904 I have tried all suggestions found so far, including 1) remove hadoop-name and hadoop-data folders and reformat namenode; 2) clean up all temp files/folders under /tmp; But nothing works. Your help is greatly appreciated. Thanks, RX Hi, I am able to start name node and data node,but while starting the jobatracker,it's troughing an error like FATAL mapred.JobTracker: java.net.BindException: Problem binding to localhost/127.0.0.1:5102 : Address already in use kindly help me ASAP.. regards, Srinivas
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On Thu, May 26, 2011 at 07:01PM, Xu, Richard wrote: 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Is your DFS up running, by any chance? Cos
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Friday, May 27, 2011 11:42 AM To: common-user@hadoop.apache.org Cc: Xu, Richard [ICG-IT] Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Add more to that: I also tried start 0.20.2 on a linux machine in distributed mode, same error. I had successfully started 0.20.203 on this linux machine with same config. Seems that it is not related to Solaris. Could it caused by port? I checked a few, did not find anyone blocked. -Original Message- From: Xu, Richard [ICG-IT] Sent: Friday, May 27, 2011 4:18 PM To: 'Allen Wittenauer'; 'common-user@hadoop.apache.org' Subject: RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Friday, May 27, 2011 11:42 AM To: common-user@hadoop.apache.org Cc: Xu, Richard [ICG-IT] Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On May 27, 2011, at 1:18 PM, Xu, Richard wrote: Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? It's in the FAQ so that I don't have to do that. http://wiki.apache.org/hadoop/FAQ Also, check your logs. All your logs. Not just the namenode log.
Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Hi Folks, We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now. Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release. It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version. 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs: 2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2. Also, Configured Capacity is 0, cannot put any file to HDFS. 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing: 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting . 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904 I have tried all suggestions found so far, including 1) remove hadoop-name and hadoop-data folders and reformat namenode; 2) clean up all temp files/folders under /tmp; But nothing works. Your help is greatly appreciated. Thanks, RX