Kanak Biscuitwala created HELIX-321:
---------------------------------------
Summary: Controller forgets that it's the leader
Key: HELIX-321
URL: https://issues.apache.org/jira/browse/HELIX-321
Project: Apache Helix
Issue Type: Bug
Reporter: Kanak Biscuitwala
Attachments: leader_election.txt
1. See log messages:
INFO [2013-11-22 17:34:11,919] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from
server in 20171ms for sessionid 0x142016175c10856, closing socket connection
and attempting reconnect
INFO [2013-11-22 17:34:22,051] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
eat1-app87.corp/172.18.158.133:2181
INFO [2013-11-22 17:34:22,052] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Socket connection established to
eat1-app87.corp/172.18.158.133:2181, initiating session
INFO [2013-11-22 17:34:22,055] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
session 0x142016175c10856 has expired, closing socket connection
INFO [2013-11-22 17:34:22,055] main-EventThread - org.I0Itec.zkclient.ZkClient
- zookeeper state changed (Expired)
INFO [2013-11-22 17:34:22,055] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.manager.zk.ZkHelixConnection - KeeperState:Expired,
expiredSessionId: 142016175c10856
2. Controller reconnects, removes all callbacks
INFO [2013-11-22 17:34:22,068] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Socket connection established to
eat1-app87.corp/172.18.158.133:2181, initiating session
INFO [2013-11-22 17:34:22,126] main-SendThread(eat1-app87.corp:2181) -
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
eat1-app87.corp/172.18.158.133:2181, sessionid = 0x142016175c1085c, negotiated
timeout = 30000
INFO [2013-11-22 17:34:22,126] main-EventThread - org.I0Itec.zkclient.ZkClient
- zookeeper state changed (SyncConnected)
3. Callbacks ignored; not leader, relenquishes leadership
ERROR [2013-11-22 17:34:22,187] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.controller.GenericHelixController - Cluster manager:
controller1 is not leader. Pipeline will not be invoked
INFO [2013-11-22 17:34:22,200] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 reqlinquishes
leadership of cluster: perf-test-cluster
4. Controller reacquires leadership
INFO [2013-11-22 17:34:22,204] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 is trying to
acquire leadership for cluster: perf-test-cluster
INFO [2013-11-22 17:34:22,215] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 acquires
leadership of cluster: perf-test-cluster
4. Controller thinks it's not leader even though the LEADER node is in place
and correct
ERROR [2013-11-22 17:34:22,294] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.controller.GenericHelixController - Cluster manager:
controller1 is not leader. Pipeline will not be invoked
5. Controller tries to become leader when it already is???
INFO [2013-11-22 17:34:22,335] ZkClient-EventThread-10-eat1-app87.corp:2181 -
org.apache.helix.manager.zk.ZkHelixLeaderElection - controller1 is trying to
acquire leadership for cluster: perf-test-cluster
Logs attached
--
This message was sent by Atlassian JIRA
(v6.1#6144)