[ 
https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063110#comment-15063110
 ] 

Gwen Shapira commented on KAFKA-3004:
-------------------------------------

Possibly long GC pauses cause Kafka to lose the ZK session which causes the 
failover.

You can:
1. Enable G1GC, which does much better job of keeping up
2. Modify the zookeeper.session.timeout.ms to something a bit higher (default 
is 6000 ms)

> Brokers failing over repeatadly
> -------------------------------
>
>                 Key: KAFKA-3004
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3004
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, network
>    Affects Versions: 0.8.2.0
>         Environment: Centos 6.5
> OpenJDK 1.7.0_79
> 6 Kafka nodes
> 3 ZK nodes (cluster mode)
>            Reporter: Sadek
>            Assignee: Neha Narkhede
>
> While doing load testing we have noticed one of more brokers will 
> un-register/register almost every hour with the following entry in its log:
> INFO [SessionExpirationListener on 4], ZK expired; shut down all controller 
> components and try to re-elect 
> (kafka.controller.KafkaController$SessionExpirationListener)
> I noticed an increase in minor-GC collection around the same time.
> 2015-12-17T22:00:40.961+0000: 15693.112: [GC2015-12-17T22:00:46.404+0000: 
> 15698.554: [ParNew: 282865K->3922K(314560K), 0.0104700 secs] 
> 576345K->297570K(1013632K), 5.4531250 secs] [Times: user=0.05 sys=0.00, 
> real=5.46 secs]
> And also disk IO spike on the Kafka nodes.
>  
> Here's a snippet of the broker log around that time
> [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) 
> (org.I0Itec.zkclient.ZkClient)
> 15754934 [main-SendThread(kfk02.local:2182)] INFO 
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard 
> from server in 12203ms for sessionid 0x151b10503e60002, closing socket 
> connection and attempting reconnect
> [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) 
> (org.I0Itec.zkclient.ZkClient)
> 15755399 [main-SendThread(kfk01.local:2182)] INFO 
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
> kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL 
> (unknown error)
> 15755400 [main-SendThread(kfk01.local:2182)] INFO 
> org.apache.zookeeper.ClientCnxn - Socket connection established to 
> kfk01.local/10.124.80.140:2182, initiating session
> 15755401 [main-SendThread(kfk01.local:2182)] INFO 
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server 
> kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated 
> timeout = 12000
> [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) 
> (org.I0Itec.zkclient.ZkClient)
> Any idea what may be causing this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to