Re: Starting HBase in fully distributed mode...

Jean-Daniel Cryans Fri, 04 Dec 2009 14:45:53 -0800

It seems not... For example on my dev machine I have an interface for
wired network and another one for wireless. When I start ZK it binds
on only one interface so if I connect to the other IP it doesn't work.


J-D

On Fri, Dec 4, 2009 at 2:35 PM, Patrick Hunt <[email protected]> wrote:
> Sorry, but I'm still not able to grok this issue. Perhaps you can shed more
> light: here's the exact code from our server to bind to the client port:
>
>    ss.socket().bind(new InetSocketAddress(port));
>
> my understanding from the java docs is this:
>
>    public InetSocketAddress(int port)
>        "Creates a socket address where the IP address is the wildcard
> address and the port number a specified value."
>
>
> afaik this binds the socket onto the specified port for any ip on any
> interface of the host. Where am I going wrong?
>
> Patrick
>
> Jean-Daniel Cryans wrote:
>>
>> The first two definitions here is what I'm talking about
>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1346
>>
>> So by default it usually doesn't listen on the interface associated
>> with the hostname ec2-IP-compute-1.amazonaws.com but on the other one
>> (IIRC starts with dom-).
>>
>> J-D
>>
>> On Fri, Dec 4, 2009 at 12:41 PM, Patrick Hunt <[email protected]> wrote:
>>>
>>> I'm not familiar with ec2, when you say "listen on private hostname" what
>>> does that mean? Do you mean "by default listen on an interface with a
>>> non-routable (localonly) ip"? Or something else. Is there an aws page you
>>> can point me to?
>>>
>>> Patrick
>>>
>>> Jean-Daniel Cryans wrote:
>>>>
>>>> When you saw:
>>>>
>>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
>>>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode.
>>>> The ratio of reported blocks 0.0000 has not reached the threshold
>>>> 0.9990.
>>>> *Safe
>>>> mode will be turned off automatically*.
>>>>
>>>> It means that HDFS is blocking everything (aka safe mode) until all
>>>> datanodes reported for duty (and then it waits for 30 seconds to make
>>>> sure).
>>>>
>>>> When you saw:
>>>>
>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>> KeeperErrorCode = *NoNode for /hbase/master*
>>>>
>>>> It means that the Master node didn't write his znode in Zookeeper
>>>> because... when you saw:
>>>>
>>>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception
>>>> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> It really means that the connection was refused. It then says it
>>>> attempted to connect to ec2-174-129-127-141.compute-1.amazonaws.com
>>>> but wasn't able to. AFAIK in EC2 the java processes tend to listen on
>>>> their private hostname not the public one (which would be bad
>>>> anyways).
>>>>
>>>> Bottom line, make sure stuff listens where they are expected and it
>>>> should then work well.
>>>>
>>>> J-D
>>>>
>>>> On Fri, Dec 4, 2009 at 11:23 AM, Something Something
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hadoop: 0.20.1
>>>>>
>>>>> HBase: 0.20.2
>>>>>
>>>>> Zookeeper: The one which gets started by default by HBase.
>>>>>
>>>>>
>>>>> HBase logs:
>>>>>
>>>>> 1)  Master log shows this WARN message, but then it says 'connection
>>>>> successful'
>>>>>
>>>>>
>>>>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception
>>>>> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5
>>>>> java.net.ConnectException: Connection refused
>>>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
>>>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>>>> exception during shutdown input
>>>>> java.nio.channels.ClosedChannelException
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>>>>>      at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>>>> exception during shutdown output
>>>>> java.nio.channels.ClosedChannelException
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>>>>>      at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>>>      at
>>>>>
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>>>> 2009-12-04 07:07:37,199 INFO
>>>>> org.apache.hadoop.hbase.master.RegionManager:
>>>>> -ROOT- region unset (but not set to be reassigned)
>>>>> 2009-12-04 07:07:37,200 INFO
>>>>> org.apache.hadoop.hbase.master.RegionManager:
>>>>> ROOT inserted into regionsInTransition
>>>>> 2009-12-04 07:07:37,667 INFO org.apache.zookeeper.ClientCnxn:
>>>>> Attempting
>>>>> connection to server
>>>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181
>>>>> 2009-12-04 07:07:37,668 INFO org.apache.zookeeper.ClientCnxn: Priming
>>>>> connection to java.nio.channels.SocketChannel[connected local=/
>>>>> 10.252.162.19:46195 remote=
>>>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181]
>>>>> 2009-12-04 07:07:37,670 INFO org.apache.zookeeper.ClientCnxn: Server
>>>>> connection successful
>>>>>
>>>>>
>>>>>
>>>>> 2)  Regionserver log shows this... but later seems to have recovered:
>>>>>
>>>>> 2009-12-04 07:07:36,576 WARN org.apache.zookeeper.ClientCnxn: Exception
>>>>> closing session 0x0 to sun.nio.ch.selectionkeyi...@4ee70b
>>>>> java.net.ConnectException: Connection refused
>>>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
>>>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>>>> exception during shutdown input
>>>>> java.nio.channels.ClosedChannelException
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>>>>>      at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>>>> exception during shutdown output
>>>>> java.nio.channels.ClosedChannelException
>>>>>      at
>>>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>>>>>      at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>>>      at
>>>>>
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
>>>>>      at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>>>> 2009-12-04 07:07:36,742 WARN
>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set
>>>>> watcher
>>>>> on
>>>>> ZNode /hbase/master
>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>>>>      at
>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>>>      at
>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>>      at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:304)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:385)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
>>>>>      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>>      at
>>>>>
>>>>>
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>>>      at
>>>>>
>>>>>
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>>      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2474)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2542)
>>>>> 2009-12-04 07:07:36,743 WARN
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set
>>>>> watcher
>>>>> on
>>>>> ZooKeeper master address. Retrying.
>>>>>
>>>>>
>>>>>
>>>>> 3)  Zookeepr log:  Nothing much in there... just a starting message
>>>>> line..
>>>>> followed by
>>>>>
>>>>> ulimit -n 1024
>>>>>
>>>>> I looked at archives.  There was one mail that talked about 'ulimit'.
>>>>>  Wonder if that has something to do with it.
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 4, 2009 at 8:18 AM, Mark Vigeant
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> When I first started my hbase cluster, it too gave me the nonode for
>>>>>> /hbase/master several times before it started working, and I believe
>>>>>> this is
>>>>>> a common beginner's error (I've seen it in a few emails in the past 2
>>>>>> weeks).
>>>>>>
>>>>>> What versions of HBase, Hadoop and ZooKeeper are you using?
>>>>>>
>>>>>> Also, take a look in your HBASE_HOME/logs folder. That would be a good
>>>>>> place to start looking for some answers.
>>>>>>
>>>>>> -Mark
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Something Something [mailto:[email protected]]
>>>>>> Sent: Friday, December 04, 2009 2:28 AM
>>>>>> To: [email protected]
>>>>>> Subject: Starting HBase in fully distributed mode...
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to get Hadoop/HBase up and running in a fully distributed
>>>>>> mode.
>>>>>>  For now, I have only *1 Master & 2 Slaves*.
>>>>>>
>>>>>> The Hadoop starts correctly.. I think.  The only exception I see in
>>>>>> various
>>>>>> log files is this one...
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.ipc.RemoteException:
>>>>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
>>>>>> delete
>>>>>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode.
>>>>>> The ratio of reported blocks 0.0000 has not reached the threshold
>>>>>> 0.9990.
>>>>>> *Safe
>>>>>> mode will be turned off automatically*.
>>>>>>      at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1696)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1676)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517)
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>
>>>>>>
>>>>>> Somehow this doesn't sound critical, so I assumed everything was good
>>>>>> to
>>>>>> go
>>>>>> with Hadoop.
>>>>>>
>>>>>>
>>>>>> So then I started HBase and opened a shell (hbase shell).  So far
>>>>>> everything
>>>>>> looks good.  Now when I try to run a 'list' command, I keep getting
>>>>>> this
>>>>>> message:
>>>>>>
>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>>>> KeeperErrorCode = *NoNode for /hbase/master*
>>>>>> at
>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>>>> at
>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892)
>>>>>> at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:328)
>>>>>>
>>>>>>
>>>>>> Here's what I have in my *Master hbase-site.xml*
>>>>>>
>>>>>> <configuration>
>>>>>>  <property>
>>>>>>  <name>hbase.rootdir</name>
>>>>>>  <value>hdfs://master:54310/hbase</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>  <name>hbase.cluster.distributed</name>
>>>>>>  <value>true</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>  <name>hbase.zookeeper.property.clientPort</name>
>>>>>>  <value>2181</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>  <name>hbase.zookeeper.quorum</name>
>>>>>>  <value>master,slave1,slave2</value>
>>>>>>  </property>
>>>>>> <property>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The *Slave *hbase-site.xml are set as follows:
>>>>>>
>>>>>>  <property>
>>>>>>  <name>hbase.rootdir</name>
>>>>>>  <value>hdfs://master:54310/hbase</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>  <name>hbase.cluster.distributed</name>
>>>>>>  <value>false</value>
>>>>>>  </property>
>>>>>>  <property>
>>>>>>  <name>hbase.zookeeper.property.clientPort</name>
>>>>>>  <value>2181</value>
>>>>>>  </property>
>>>>>>
>>>>>>
>>>>>> In the hbase-env.sh file on ALL 3 machines I have set the JAVA_HOME
>>>>>> and
>>>>>> set
>>>>>> the HBase classpath as follows:
>>>>>>
>>>>>> export HBASE_CLASSPATH=$HBASE_CLASSPATH:/ebs1/hadoop-0.20.1/conf
>>>>>>
>>>>>>
>>>>>> On *Master* I have added Master & Slaves IP hostnames to
>>>>>> *regionservers*
>>>>>> file.
>>>>>>  On *slaves*, the regionservers file is empty.
>>>>>>
>>>>>>
>>>>>> I have run hadoop namenode -format multiple times, but still keep
>>>>>> getting..
>>>>>> "NoNode for /hbase/master".  What step did I miss?  Thanks for your
>>>>>> help.
>>>>>>
>>>>>> This email message and any attachments are for the sole use of the
>>>>>> intended
>>>>>> recipients and may contain proprietary and/or confidential information
>>>>>> which
>>>>>> may be privileged or otherwise protected from disclosure. Any
>>>>>> unauthorized
>>>>>> review, use, disclosure or distribution is prohibited. If you are not
>>>>>> an
>>>>>> intended recipient, please contact the sender by reply email and
>>>>>> destroy
>>>>>> the
>>>>>> original message and any copies of the message as well as any
>>>>>> attachments to
>>>>>> the original message.
>>>>>>
>

Re: Starting HBase in fully distributed mode...

Reply via email to