[
https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sadek updated KAFKA-3004:
-------------------------
Description:
While doing load testing we have noticed that the controller will fail over
almost every hour with the following entry on its log:
INFO [SessionExpirationListener on 4], ZK expired; shut down all controller
components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)
I also see an increase in minor-GC collection around the same time.
2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000:
8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs]
603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00,
real=5.71 secs]
Here's a snippet of the broker log around that time
[2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected)
(org.I0Itec.zkclient.ZkClient)
15754934 [main-SendThread(kfk02.local:2182)] INFO
org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from
server in 12203ms for sessionid 0x151b10503e60002, closing socket connection
and attempting reconnect
[2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected)
(org.I0Itec.zkclient.ZkClient)
15755399 [main-SendThread(kfk01.local:2182)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL
(unknown error)
15755400 [main-SendThread(kfk01.local:2182)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
kfk01.local/10.124.80.140:2182, initiating session
15755401 [main-SendThread(kfk01.local:2182)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated
timeout = 12000
[2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected)
(org.I0Itec.zkclient.ZkClient)
Any idea what may be causing this?
Thanks!
was:
While doing load testing we have noticed that the controller will fail over
almost every hour with the following entry on its log:
INFO [SessionExpirationListener on 4], ZK expired; shut down all controller
components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)
I also see an increase in minor-GC collection around the same time.
2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000:
8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs]
603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00,
real=5.71 secs]
I've tried increasing zookeeper.connection.timeout.ms to 60000 but it doesn't
seem to help and I still see the default (6000) value in the ZK logs:
INFO org.apache.zookeeper.server.ZooKeeperServer - Established session
0x351b0090ea80000 with negotiated timeout 6000 for client /10......
Any idea what may be causing this?
Thanks!
> Controller failing over repeatadly
> ----------------------------------
>
> Key: KAFKA-3004
> URL: https://issues.apache.org/jira/browse/KAFKA-3004
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.8.2.0
> Environment: Centos 6.5
> OpenJDK 1.7.0_79
> 6 Kafka nodes
> 3 ZK nodes (cluster mode)
> Reporter: Sadek
> Assignee: Neha Narkhede
>
> While doing load testing we have noticed that the controller will fail over
> almost every hour with the following entry on its log:
> INFO [SessionExpirationListener on 4], ZK expired; shut down all controller
> components and try to re-elect
> (kafka.controller.KafkaController$SessionExpirationListener)
> I also see an increase in minor-GC collection around the same time.
> 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000:
> 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs]
> 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00,
> real=5.71 secs]
> Here's a snippet of the broker log around that time
> [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected)
> (org.I0Itec.zkclient.ZkClient)
> 15754934 [main-SendThread(kfk02.local:2182)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 12203ms for sessionid 0x151b10503e60002, closing socket
> connection and attempting reconnect
> [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected)
> (org.I0Itec.zkclient.ZkClient)
> 15755399 [main-SendThread(kfk01.local:2182)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL
> (unknown error)
> 15755400 [main-SendThread(kfk01.local:2182)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kfk01.local/10.124.80.140:2182, initiating session
> 15755401 [main-SendThread(kfk01.local:2182)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated
> timeout = 12000
> [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected)
> (org.I0Itec.zkclient.ZkClient)
> Any idea what may be causing this?
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)