On Mon, 2009-12-07 at 11:23 -0800, Patrick Hunt wrote:
> Richard Dorman wrote:
> > I'm trying to startup a quorum of Zookeeper servers in a cluster,
> > however, Zookeeper is failing to start because it cannot find its
> > hostname in the list of Zookeeper quorum servers.
>
> Can you provide the contents of the ZK log for this? The thing is, we
> don't do as you say "lookup our hostname in the list of zk quorum
> servers", rather we rely on the "myid" file, which resides in the data
> directory (you should have created when you setup the cluster) to
> identify who "we" (meaning the server) are during server start.
>
In my initial post I didn't mention that Zookeeper is being started by
hbase so the problem is occurring in HQuorumPeer (see below). An
exception is raised if the hostname look up fails. If it succeeds the
myid is returned.
Here is the .out file:
java.io.IOException: Could not find my address: 0-1c-c0-fa-f-ca in list
of ZooKeeper quorum servers
at
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.writeMyID(HQuorumPeer.java:128)
at
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:67)
> So:
> 1) myid file has the server id
> 2) config file on each server has something like
>
> server.1=host1:2888:2889
> server.2=host2:2988:2989
>
> where host1 will have myid file with "1"
> host2 will have myid file with "2"
>
> > I know this problem is well documented on the WIKI, however, my
> > situation is a little different. The allocation of a node to become a
> > Zookeeper is done dynamically by a management service running else where
> > on the cluster. This node then associates its IP with a hostname in the
> > Zookeeper quorum list. The hostname is not the default hostname of the
> > node. The node may associate its IP with multiple hostnames for each
> > service that it is allocated.
>
> We register a server socket as follows:
>
> ss = new ServerSocket(self.getQuorumAddress().getPort());
>
> Note: we only specify the port number, not the host name/addr here. This
> should mean that the socket will register on all interfaces (on the
> host) for all possible ip addresses (wildcard match).
>
> > This causes a problem when Zookeeper starts. Zookeeper does a
> > getdefaulthost which will return the nodes default hostname and not the
> > associated hostname.
>
> As I mentioned I'd like to see the log for this error.
>
> > So my questions are:
> >
> > 1. Is it possible to resolve this some other way? We are not running DNS
> > (hostname associations are managed by our own services). We also cannot
> > use the nodes ip address as the nodes are allocated dynamically.
> > Dynamically updating the config files is also not practical.
> >
> > 2. Why does Zookeeper need to test whether its hostname is in the
> > Zookeeper quorim list? Can this safely be disabled?
>
> AFAIC we are not doing this. If you could send your config file as well
> it would be interesting to see in addition to the log of the error.
>
> This is EC2 or something else? What version of ZK are you running?
>
This isn't EC2. Its a custom solution so we have more flexibility when
configuring our cluster.
I'm running HBase 0.20.2, which includes Zookeeper 3.2.1.
Other than specifying a quorum of 5 servers all config settings are the
defaults set by HBase (seems redundant to paste here). My quorum looks
like the following:
<property>
<name>hbase.zookeeper.quorum</name>
<value>hbasezookeeper1.local,hbasezookeeper2.local,hbasezookeeper3.local,hbasezookeeper4.local,hbasezookeeper5.local</value>
</property>
As a temporary solution I have created a patch for HQuorumPeer that
pulls the hostname from an environment variable which is set just before
starting Zookeeper on a node but I doubt that this is a good long term
fix.
> Patrick
>