Yep those hostnames are the private ones. J-D
On Fri, Dec 4, 2009 at 12:52 PM, Something Something <[email protected]> wrote: > Thanks, Jean. My bad. I will try with the private hostnames later. I > believe under EC2 they look something like this.... > > domU-12-31-38-00-9D-E3 > > > > On Fri, Dec 4, 2009 at 12:41 PM, Patrick Hunt <[email protected]> wrote: > >> I'm not familiar with ec2, when you say "listen on private hostname" what >> does that mean? Do you mean "by default listen on an interface with a >> non-routable (localonly) ip"? Or something else. Is there an aws page you >> can point me to? >> >> Patrick >> >> >> Jean-Daniel Cryans wrote: >> >>> When you saw: >>> >>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete >>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode. >>> The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. >>> *Safe >>> mode will be turned off automatically*. >>> >>> It means that HDFS is blocking everything (aka safe mode) until all >>> datanodes reported for duty (and then it waits for 30 seconds to make >>> sure). >>> >>> When you saw: >>> >>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>> KeeperErrorCode = *NoNode for /hbase/master* >>> >>> It means that the Master node didn't write his znode in Zookeeper >>> because... when you saw: >>> >>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5 >>> java.net.ConnectException: Connection refused >>> >>> It really means that the connection was refused. It then says it >>> attempted to connect to ec2-174-129-127-141.compute-1.amazonaws.com >>> but wasn't able to. AFAIK in EC2 the java processes tend to listen on >>> their private hostname not the public one (which would be bad >>> anyways). >>> >>> Bottom line, make sure stuff listens where they are expected and it >>> should then work well. >>> >>> J-D >>> >>> On Fri, Dec 4, 2009 at 11:23 AM, Something Something >>> <[email protected]> wrote: >>> >>>> Hadoop: 0.20.1 >>>> >>>> HBase: 0.20.2 >>>> >>>> Zookeeper: The one which gets started by default by HBase. >>>> >>>> >>>> HBase logs: >>>> >>>> 1) Master log shows this WARN message, but then it says 'connection >>>> successful' >>>> >>>> >>>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception >>>> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5 >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) >>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>>> exception during shutdown input >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>>> exception during shutdown output >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>>> 2009-12-04 07:07:37,199 INFO >>>> org.apache.hadoop.hbase.master.RegionManager: >>>> -ROOT- region unset (but not set to be reassigned) >>>> 2009-12-04 07:07:37,200 INFO >>>> org.apache.hadoop.hbase.master.RegionManager: >>>> ROOT inserted into regionsInTransition >>>> 2009-12-04 07:07:37,667 INFO org.apache.zookeeper.ClientCnxn: Attempting >>>> connection to server >>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181 >>>> 2009-12-04 07:07:37,668 INFO org.apache.zookeeper.ClientCnxn: Priming >>>> connection to java.nio.channels.SocketChannel[connected local=/ >>>> 10.252.162.19:46195 remote= >>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181] >>>> 2009-12-04 07:07:37,670 INFO org.apache.zookeeper.ClientCnxn: Server >>>> connection successful >>>> >>>> >>>> >>>> 2) Regionserver log shows this... but later seems to have recovered: >>>> >>>> 2009-12-04 07:07:36,576 WARN org.apache.zookeeper.ClientCnxn: Exception >>>> closing session 0x0 to sun.nio.ch.selectionkeyi...@4ee70b >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) >>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>>> exception during shutdown input >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>>> exception during shutdown output >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>>> 2009-12-04 07:07:36,742 WARN >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher >>>> on >>>> ZNode /hbase/master >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /hbase/master >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780) >>>> at >>>> >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:304) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:385) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276) >>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>> Method) >>>> at >>>> >>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >>>> at >>>> >>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2474) >>>> at >>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2542) >>>> 2009-12-04 07:07:36,743 WARN >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher >>>> on >>>> ZooKeeper master address. Retrying. >>>> >>>> >>>> >>>> 3) Zookeepr log: Nothing much in there... just a starting message >>>> line.. >>>> followed by >>>> >>>> ulimit -n 1024 >>>> >>>> I looked at archives. There was one mail that talked about 'ulimit'. >>>> Wonder if that has something to do with it. >>>> >>>> Thanks for your help. >>>> >>>> >>>> >>>> On Fri, Dec 4, 2009 at 8:18 AM, Mark Vigeant >>>> <[email protected]>wrote: >>>> >>>> When I first started my hbase cluster, it too gave me the nonode for >>>>> /hbase/master several times before it started working, and I believe >>>>> this is >>>>> a common beginner's error (I've seen it in a few emails in the past 2 >>>>> weeks). >>>>> >>>>> What versions of HBase, Hadoop and ZooKeeper are you using? >>>>> >>>>> Also, take a look in your HBASE_HOME/logs folder. That would be a good >>>>> place to start looking for some answers. >>>>> >>>>> -Mark >>>>> >>>>> -----Original Message----- >>>>> From: Something Something [mailto:[email protected]] >>>>> Sent: Friday, December 04, 2009 2:28 AM >>>>> To: [email protected] >>>>> Subject: Starting HBase in fully distributed mode... >>>>> >>>>> Hello, >>>>> >>>>> I am trying to get Hadoop/HBase up and running in a fully distributed >>>>> mode. >>>>> For now, I have only *1 Master & 2 Slaves*. >>>>> >>>>> The Hadoop starts correctly.. I think. The only exception I see in >>>>> various >>>>> log files is this one... >>>>> >>>>> >>>>> org.apache.hadoop.ipc.RemoteException: >>>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete >>>>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode. >>>>> The ratio of reported blocks 0.0000 has not reached the threshold >>>>> 0.9990. >>>>> *Safe >>>>> mode will be turned off automatically*. >>>>> at >>>>> >>>>> >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1696) >>>>> at >>>>> >>>>> >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1676) >>>>> at >>>>> >>>>> org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> >>>>> >>>>> Somehow this doesn't sound critical, so I assumed everything was good to >>>>> go >>>>> with Hadoop. >>>>> >>>>> >>>>> So then I started HBase and opened a shell (hbase shell). So far >>>>> everything >>>>> looks good. Now when I try to run a 'list' command, I keep getting this >>>>> message: >>>>> >>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>>> KeeperErrorCode = *NoNode for /hbase/master* >>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) >>>>> at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:328) >>>>> >>>>> >>>>> Here's what I have in my *Master hbase-site.xml* >>>>> >>>>> <configuration> >>>>> <property> >>>>> <name>hbase.rootdir</name> >>>>> <value>hdfs://master:54310/hbase</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.cluster.distributed</name> >>>>> <value>true</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.property.clientPort</name> >>>>> <value>2181</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.quorum</name> >>>>> <value>master,slave1,slave2</value> >>>>> </property> >>>>> <property> >>>>> >>>>> >>>>> >>>>> The *Slave *hbase-site.xml are set as follows: >>>>> >>>>> <property> >>>>> <name>hbase.rootdir</name> >>>>> <value>hdfs://master:54310/hbase</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.cluster.distributed</name> >>>>> <value>false</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.property.clientPort</name> >>>>> <value>2181</value> >>>>> </property> >>>>> >>>>> >>>>> In the hbase-env.sh file on ALL 3 machines I have set the JAVA_HOME and >>>>> set >>>>> the HBase classpath as follows: >>>>> >>>>> export HBASE_CLASSPATH=$HBASE_CLASSPATH:/ebs1/hadoop-0.20.1/conf >>>>> >>>>> >>>>> On *Master* I have added Master & Slaves IP hostnames to *regionservers* >>>>> file. >>>>> On *slaves*, the regionservers file is empty. >>>>> >>>>> >>>>> I have run hadoop namenode -format multiple times, but still keep >>>>> getting.. >>>>> "NoNode for /hbase/master". What step did I miss? Thanks for your >>>>> help. >>>>> >>>>> This email message and any attachments are for the sole use of the >>>>> intended >>>>> recipients and may contain proprietary and/or confidential information >>>>> which >>>>> may be privileged or otherwise protected from disclosure. Any >>>>> unauthorized >>>>> review, use, disclosure or distribution is prohibited. If you are not an >>>>> intended recipient, please contact the sender by reply email and destroy >>>>> the >>>>> original message and any copies of the message as well as any >>>>> attachments to >>>>> the original message. >>>>> >>>>> >
