The controller log in broker 1 is too late. Could you send its log around 2014-05-12 21:24:37?
Thanks, Jun On Mon, May 12, 2014 at 5:02 PM, Steven Wu <steve...@netflix.com> wrote: > This is a three-node cluster. broker 0 lost connection to ZK. broker 1 > does seem to take the controller role. but broker 0 stuck in the bad state > and wasn't able to recover. > > it seems to start with these error msgs. I have attached complete > controller and server log for broker 0 and 1. > > I am using "kafka_2.9.2-0.8.1.1.jar". > > Thanks, > Steven > > [2014-05-12 21:24:28,737] INFO Client session timed out, have not heard > from server in 4000ms for sessionid 0xb145a9585a806013, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:28,838] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > [2014-05-12 21:24:29,360] INFO Opening socket connection to server > ip-10-84-58-49.ec2.internal/10.84.58.49:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:30,562] INFO Client session timed out, have not heard > from server in 1724ms for sessionid 0xb145a9585a806013, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:30,706] INFO Opening socket connection to server > ip-10-45-171-150.ec2.internal/10.45.171.150:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:31,907] INFO Client session timed out, have not heard > from server in 1245ms for sessionid 0xb145a9585a806013, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:32,927] INFO Opening socket connection to server > ip-10-45-172-197.ec2.internal/10.45.172.197:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:34,128] INFO Client session timed out, have not heard > from server in 2120ms for sessionid 0xb145a9585a806013, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:34,380] INFO Opening socket connection to server > ip-10-33-144-217.ec2.internal/10.33.144.217:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:35,581] INFO Client session timed out, have not heard > from server in 1353ms for sessionid 0xb145a9585a806013, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:36,033] INFO Closing socket connection to /10.60.94.175. > (kafka.network.Processor) > [2014-05-12 21:24:37,282] INFO Opening socket connection to server > ip-10-78-235-244.ec2.internal/10.78.235.244:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,284] INFO Socket connection established to > ip-10-78-235-244.ec2.internal/10.78.235.244:2181, initiating session > (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,288] INFO zookeeper state changed (Expired) > (org.I0Itec.zkclient.ZkClient) > [2014-05-12 21:24:37,288] INFO Unable to reconnect to ZooKeeper service, > session 0xb145a9585a806013 has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,288] INFO Initiating client connection, connectString= > ec2-50-19-255-1.compute-1.amazonaws.com:2181, > ec2-54-235-159-245.compute-1.amazonaws.com:2181, > ec2-50-19-255-97.compute-1.amazonaws.com:2181, > ec2-184-73-152-248.compute-1.amazonaws.com:2181, > ec2-50-17-247-179.compute-1.amazonaws.com:2181/kafkabroker/defaultsessionTimeout=6000 > watcher=org.I0Itec.zkclient.ZkClient@6f2a4d2f(org.apache.zookeeper.ZooKeeper) > [2014-05-12 21:24:37,305] INFO EventThread shut down > (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,311] INFO Opening socket connection to server > ip-10-171-10-136.ec2.internal/10.171.10.136:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,311] INFO Socket connection established to > ip-10-171-10-136.ec2.internal/10.171.10.136:2181, initiating session > (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,316] INFO Session establishment complete on server > ip-10-171-10-136.ec2.internal/10.171.10.136:2181, sessionid = > 0xb445f1c68e5c013d, negotiated timeout = 6000 > (org.apache.zookeeper.ClientCnxn) > [2014-05-12 21:24:37,316] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > [2014-05-12 21:24:37,335] INFO conflict in /controller data: > {"version":1,"brokerid":0,"timestamp":"1399929877328"} stored data: > {"version":1,"brokerid":1,"timestamp":"1399321265926"} (kafka.utils.: > [2014-05-12 21:24:37,348] INFO re-registering broker info in ZK for broker > 0 (kafka.server.KafkaHealthcheck) > [2014-05-12 21:24:37,352] INFO Registered broker 0 at path /brokers/ids/0 > with address ec2-23-20-93-219.compute-1.amazonaws.com:7101. > (kafka.utils.ZkUtils$) > [2014-05-12 21:24:37,363] INFO done re-registering broker > (kafka.server.KafkaHealthcheck) > [2014-05-12 21:24:37,372] INFO Subscribing to /brokers/topics path to > watch for new topics (kafka.server.KafkaHealthcheck) > [2014-05-12 21:24:37,382] INFO New leader is 1 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) > [2014-05-12 21:24:38,791] ERROR Closing socket for /10.165.36.77 because > of error (kafka.network.Processor) > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at kafka.utils.Utils$.read(Utils.scala:375) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Processor.read(SocketServer.scala:347) > at kafka.network.Processor.run(SocketServer.scala:245) > at java.lang.Thread.run(Thread.java:724) > [2014-05-12 21:24:55,939] INFO Partition [trace_annotation,4] on broker 0: > Shrinking ISR for partition [trace_annotation,4] from 0,2 to 0 > (kafka.cluster.Partition) > [2014-05-12 21:24:55,946] ERROR Conditional update of path > /brokers/topics/trace_annotation/partitions/4/state with data > {"controller_epoch":10,"leader":0,"version":1,"leader_epoch":22,"isr":[0]} > and expected version 76 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/trace_annotation/partitions/4/state > (kafka.utils.ZkUtils$) > [2014-05-12 21:24:55,958] INFO Partition [trace_annotation,4] on broker 0: > Cached zkVersion [76] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > [2014-05-12 21:24:55,959] INFO Partition [merchimpressions_client_log,2] > on broker 0: Shrinking ISR for partition [merchimpressions_client_log,2] > from 0,2 to 0 (kafka.cluster.Partition) > [2014-05-12 21:24:55,963] ERROR Conditional update of path > /brokers/topics/merchimpressions_client_log/partitions/2/state with data > {"controller_epoch":10,"leader":0,"version":1,"leader_epoch":5,"isr":[0]} > and expected version 14 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for > /brokers/topics/merchimpressions_client_log/partitions/2/state > (kafka.utils.ZkUtils$) > [2014-05-12 21:24:55,963] INFO Partition [merchimpressions_client_log,2] > on broker 0: Cached zkVersion [14] not equal to that in zookeeper, skip > updating ISR (kafka.cluster.Partition) > [2014-05-12 21:24:55,963] INFO Partition [request_trace,2] on broker 0: > Shrinking ISR for partition [request_trace,2] from 0,1 to 0 > (kafka.cluster.Partition) > [2014-05-12 21:24:55,967] ERROR Conditional update of path > /brokers/topics/request_trace/partitions/2/state with data > {"controller_epoch":10,"leader":0,"version":1,"leader_epoch":18,"isr":[0]} > and expected version 44 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/request_trace/partitions/2/state > (kafka.utils.ZkUtils$) > [2014-05-12 21:24:55,967] INFO Partition [request_trace,2] on broker 0: > Cached zkVersion [44] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > [2014-05-12 21:24:55,967] INFO Partition [hermesdiagnostics,3] on broker > 0: Shrinking ISR for partition [hermesdiagnostics,3] from 0,2 to 0 > (kafka.cluster.Partition) > [2014-05-12 21:24:55,970] INFO running health check command > (com.netflix.nfkafka.initialization.KafkaHealthCheckHandler) > [2014-05-12 21:24:55,971] ERROR Conditional update of path > /brokers/topics/hermesdiagnostics/partitions/3/state with data > {"controller_epoch":10,"leader":0,"version":1,"leader_epoch":22,"isr":[0]} > and expected version 326 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/hermesdiagnostics/partitions/3/state > (kafka.utils.ZkUtils$) >