chendihao created HBASE-10283:
---------------------------------

             Summary: Client can't connect with all the running zk servers in 
MiniZooKeeperCluster
                 Key: HBASE-10283
                 URL: https://issues.apache.org/jira/browse/HBASE-10283
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.94.3
            Reporter: chendihao


Refer to HBASE-3052, multiple zk servers can run together in minicluster. The 
problem is that client can only connect with the first zk server and if you 
kill the first one, it fails to access the cluster even though other zk servers 
are serving.

It's easy to repro.  Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call 
`killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you 
construct the zk client, it can't connect with the zk cluster for any way. Here 
is the simple log you can refer.
{noformat}
2014-01-03 12:06:58,625 INFO  [main] zookeeper.MiniZooKeeperCluster(194): 
Started MiniZK Cluster and connect 1 ZK server on client port: 55227
......
2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(264): Kill 
the current active ZK servers in the cluster on client port: 55227
2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(272): 
Activate a backup zk server in the cluster on client port: 55228
2014-01-03 12:06:59,366 INFO  [main-EventThread] zookeeper.ZooKeeper(434): 
Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 
watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
{noformat}

The log is kind of problematic because it always show "Started MiniZK Cluster 
and connect 1 ZK server" but actually there're three zk servers.

Looking deeply we find that the client is still trying to connect with the dead 
zk server's port. When I print out the zkQuorum it used, only the first zk 
server's hostport is there and it will not change no matter you kill the server 
or not. The reason for this is in ZKConfig which will convert HBase settings 
into zk's. MiniZooKeeperCluster create three servers with the same host name, 
"localhost", and different ports. But HBase self use the port and ZKConfig will 
ignore the other two servers which have the same host name.

MiniZooKeeperCluster works improperly before we fix this. The bug is not found 
because we never test whether HBase works or not if we kill the zk active or 
backup servers in ut. But apparently we should. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to