Re: Starting HBase in fully distributed mode...

Patrick Hunt Mon, 07 Dec 2009 14:58:25 -0800

I do see this, but it's pretty old:

http://markmail.org/message/vvb2i2brxezwrxtt


It's now clear to me how we would work around that though.

Also, there's an interesting comment in the discussion section of thelink you sent me to EIP:

"This doc need to explicitly point out that EIP DNS names are mapped tointernal IP addresses when used within EC2."


Patrick

Patrick Hunt wrote:

Hi J-D, any insight on why I am not seeing what you are seeing? (myquestions below?) I'd like to address this but I won't be able to unlessI can understand fully, and hopefully replicate, the environment thatcauses this to happen.
Regards,

Patrick

Patrick Hunt wrote:
That is weird because it works for me. I just tried your example (eth0vs ath0) and I was able to "echo stat |nc <ip eth0|ath0> 2181" as wellas connect a ZK client successfully using either IP address.
netstat -a shows this:
tcp6       0      0 [::]:2181   [::]:*    LISTEN


What do you see for netstat?

I'm on ipv4, are you running ipv6?

Patrick


Jean-Daniel Cryans wrote:
It seems not... For example on my dev machine I have an interface for
wired network and another one for wireless. When I start ZK it binds
on only one interface so if I connect to the other IP it doesn't work.

J-D

On Fri, Dec 4, 2009 at 2:35 PM, Patrick Hunt <[email protected]> wrote:
Sorry, but I'm still not able to grok this issue. Perhaps you canshed morelight: here's the exact code from our server to bind to the clientport:
   ss.socket().bind(new InetSocketAddress(port));

my understanding from the java docs is this:

   public InetSocketAddress(int port)
       "Creates a socket address where the IP address is the wildcard
address and the port number a specified value."


afaik this binds the socket onto the specified port for any ip on any
interface of the host. Where am I going wrong?

Patrick

Jean-Daniel Cryans wrote:
The first two definitions here is what I'm talking about
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1346
So by default it usually doesn't listen on the interface associated
with the hostname ec2-IP-compute-1.amazonaws.com but on the other one
(IIRC starts with dom-).

J-D
On Fri, Dec 4, 2009 at 12:41 PM, Patrick Hunt <[email protected]>wrote:
I'm not familiar with ec2, when you say "listen on privatehostname" what
does that mean? Do you mean "by default listen on an interface with a
non-routable (localonly) ip"? Or something else. Is there an awspage you
can point me to?

Patrick

Jean-Daniel Cryans wrote:
When you saw:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannotdelete
/ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold
0.9990.
*Safe
mode will be turned off automatically*.

It means that HDFS is blocking everything (aka safe mode) until all
datanodes reported for duty (and then it waits for 30 seconds tomake
sure).

When you saw:

Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = *NoNode for /hbase/master*

It means that the Master node didn't write his znode in Zookeeper
because... when you saw:
2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn:Exception
closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5
java.net.ConnectException: Connection refused

It really means that the connection was refused. It then says it
attempted to connect to ec2-174-129-127-141.compute-1.amazonaws.com
but wasn't able to. AFAIK in EC2 the java processes tend tolisten on
their private hostname not the public one (which would be bad
anyways).

Bottom line, make sure stuff listens where they are expected and it
should then work well.

J-D

On Fri, Dec 4, 2009 at 11:23 AM, Something Something
<[email protected]> wrote:
Hadoop: 0.20.1

HBase: 0.20.2

Zookeeper: The one which gets started by default by HBase.


HBase logs:
1) Master log shows this WARN message, but then it says'connection
successful'
2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn:Exception
closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn:Ignoring
exception during shutdown input
java.nio.channels.ClosedChannelException
     at
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)atsun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
     at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn:Ignoring
exception during shutdown output
java.nio.channels.ClosedChannelException
     at
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)atsun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
     at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2009-12-04 07:07:37,199 INFO
org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2009-12-04 07:07:37,200 INFO
org.apache.hadoop.hbase.master.RegionManager:
ROOT inserted into regionsInTransition
2009-12-04 07:07:37,667 INFO org.apache.zookeeper.ClientCnxn:
Attempting
connection to server
ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181
2009-12-04 07:07:37,668 INFO org.apache.zookeeper.ClientCnxn:Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.252.162.19:46195 remote=
ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181]
2009-12-04 07:07:37,670 INFO org.apache.zookeeper.ClientCnxn:Server
connection successful
2) Regionserver log shows this... but later seems to haverecovered:
2009-12-04 07:07:36,576 WARN org.apache.zookeeper.ClientCnxn:Exception
closing session 0x0 to sun.nio.ch.selectionkeyi...@4ee70b
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn:Ignoring
exception during shutdown input
java.nio.channels.ClosedChannelException
     at
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)atsun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
     at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn:Ignoring
exception during shutdown output
java.nio.channels.ClosedChannelException
     at
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)atsun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
     at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
     at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2009-12-04 07:07:36,742 WARN
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set
watcher
on
ZNode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
     at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
     at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780)
     at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:304)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:385)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)atsun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
     at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
     at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)atjava.lang.reflect.Constructor.newInstance(Constructor.java:513)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2474)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2542)
2009-12-04 07:07:36,743 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set
watcher
on
ZooKeeper master address. Retrying.



3)  Zookeepr log:  Nothing much in there... just a starting message
line..
followed by

ulimit -n 1024
I looked at archives. There was one mail that talked about'ulimit'.
 Wonder if that has something to do with it.

Thanks for your help.



On Fri, Dec 4, 2009 at 8:18 AM, Mark Vigeant
<[email protected]>wrote:
When I first started my hbase cluster, it too gave me thenonode for/hbase/master several times before it started working, and Ibelieve
this is
a common beginner's error (I've seen it in a few emails in thepast 2
weeks).

What versions of HBase, Hadoop and ZooKeeper are you using?
Also, take a look in your HBASE_HOME/logs folder. That would bea good
place to start looking for some answers.

-Mark

-----Original Message-----
From: Something Something [mailto:[email protected]]
Sent: Friday, December 04, 2009 2:28 AM
To: [email protected]
Subject: Starting HBase in fully distributed mode...

Hello,
I am trying to get Hadoop/HBase up and running in a fullydistributed
mode.
 For now, I have only *1 Master & 2 Slaves*.
The Hadoop starts correctly.. I think. The only exception Isee in
various
log files is this one...


org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
delete
/ebs1/mapred/system,/ebs2/mapred/system. Name node is in safemode.
The ratio of reported blocks 0.0000 has not reached the threshold
0.9990.
*Safe
mode will be turned off automatically*.
     at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1696)
     at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1676)
     at
org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517)at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
Somehow this doesn't sound critical, so I assumed everythingwas good
to
go
with Hadoop.


So then I started HBase and opened a shell (hbase shell).  So far
everything
looks good. Now when I try to run a 'list' command, I keepgetting
this
message:

Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = *NoNode for /hbase/master*
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:328)
Here's what I have in my *Master hbase-site.xml*

<configuration>
 <property>
 <name>hbase.rootdir</name>
 <value>hdfs://master:54310/hbase</value>
 </property>
 <property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
 </property>
 <property>
 <name>hbase.zookeeper.property.clientPort</name>
 <value>2181</value>
 </property>
 <property>
 <name>hbase.zookeeper.quorum</name>
 <value>master,slave1,slave2</value>
 </property>
<property>



The *Slave *hbase-site.xml are set as follows:

 <property>
 <name>hbase.rootdir</name>
 <value>hdfs://master:54310/hbase</value>
 </property>
 <property>
 <name>hbase.cluster.distributed</name>
 <value>false</value>
 </property>
 <property>
 <name>hbase.zookeeper.property.clientPort</name>
 <value>2181</value>
 </property>
In the hbase-env.sh file on ALL 3 machines I have set theJAVA_HOME
and
set
the HBase classpath as follows:

export HBASE_CLASSPATH=$HBASE_CLASSPATH:/ebs1/hadoop-0.20.1/conf


On *Master* I have added Master & Slaves IP hostnames to
*regionservers*
file.
 On *slaves*, the regionservers file is empty.


I have run hadoop namenode -format multiple times, but still keep
getting..
"NoNode for /hbase/master". What step did I miss? Thanks foryour
help.

This email message and any attachments are for the sole use of the
intended
recipients and may contain proprietary and/or confidentialinformation
which
may be privileged or otherwise protected from disclosure. Any
unauthorized
review, use, disclosure or distribution is prohibited. If youare not
an
intended recipient, please contact the sender by reply email and
destroy
the
original message and any copies of the message as well as any
attachments to
the original message.

Re: Starting HBase in fully distributed mode...

Reply via email to