[jira] [Comment Edited] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException
[ https://issues.apache.org/jira/browse/KAFKA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331377#comment-15331377 ] Simon Cooper edited comment on KAFKA-3837 at 6/15/16 8:23 AM: -- I would say the thread name is much more useful - this is specifically aimed at debugging when this exception happens, often the process has crashed out, so the thread id doesn't tell you anything. The thread name is descriptive, and often has info on what the thread's purpose is. was (Author: thecoop1984): I would say the thread name is much more useful - this is specifically aimed at debugging when this exception happens, often the process has crashed out, so the thread id doesn't tell you anything. The thread name is descriptive. > Report the name of the blocking thread when throwing > ConcurrentModificationException > > > Key: KAFKA-3837 > URL: https://issues.apache.org/jira/browse/KAFKA-3837 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Simon Cooper >Assignee: Bharat Viswanadham >Priority: Minor > > {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the > current thread it does not match the {{currentThread}} field. It would be > useful if the name of the other thread was included in the exception message > to help debug the problem when this exception occurs. > As it stands, it can be really difficult to work out what's going wrong when > there's several threads all accessing the consumer at the same time, and your > existing exclusive access logic doesn't seem to be working as it should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException
[ https://issues.apache.org/jira/browse/KAFKA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331377#comment-15331377 ] Simon Cooper commented on KAFKA-3837: - I would say the thread name is much more useful - this is specifically aimed at debugging when this exception happens, often the process has crashed out, so the thread id doesn't tell you anything. The thread name is descriptive. > Report the name of the blocking thread when throwing > ConcurrentModificationException > > > Key: KAFKA-3837 > URL: https://issues.apache.org/jira/browse/KAFKA-3837 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Simon Cooper >Assignee: Bharat Viswanadham >Priority: Minor > > {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the > current thread it does not match the {{currentThread}} field. It would be > useful if the name of the other thread was included in the exception message > to help debug the problem when this exception occurs. > As it stands, it can be really difficult to work out what's going wrong when > there's several threads all accessing the consumer at the same time, and your > existing exclusive access logic doesn't seem to be working as it should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3837) Report the name of the blocking thread when throwing ConcurrentModificationException
Simon Cooper created KAFKA-3837: --- Summary: Report the name of the blocking thread when throwing ConcurrentModificationException Key: KAFKA-3837 URL: https://issues.apache.org/jira/browse/KAFKA-3837 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 0.9.0.1 Reporter: Simon Cooper Priority: Minor {{KafkaConsumer.acquire}} throws {{ConcurrentModificationException}} if the current thread it does not match the {{currentThread}} field. It would be useful if the name of the other thread was included in the exception message to help debug the problem when this exception occurs. As it stands, it can be really difficult to work out what's going wrong when there's several threads all accessing the consumer at the same time, and your existing exclusive access logic doesn't seem to be working as it should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210419#comment-15210419 ] Simon Cooper commented on KAFKA-3296: - No, these are the only references to Expired in the logs: {code}[aric@reg3 ~]$ grep Expired /aric/logs/kafka/* /aric/logs/kafka/server.log:2016-03-24T14:29:08,207 | INFO | k.s.DelayedOperationPurgatory$ExpiredOperationReaper [ExpirationReaper-1] | [ExpirationReaper-1], Starting /aric/logs/kafka/server.log:2016-03-24T14:29:08,208 | INFO | k.s.DelayedOperationPurgatory$ExpiredOperationReaper [ExpirationReaper-1] | [ExpirationReaper-1], Starting /aric/logs/kafka/server.log:2016-03-24T14:29:08,360 | INFO | k.s.DelayedOperationPurgatory$ExpiredOperationReaper [ExpirationReaper-1] | [ExpirationReaper-1], Starting /aric/logs/kafka/server.log:2016-03-24T14:29:08,369 | INFO | k.s.DelayedOperationPurgatory$ExpiredOperationReaper [ExpirationReaper-1] | [ExpirationReaper-1], Starting{code} There are also previous debug logs from the broker and controller already attached to the ticket > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: controller.zip, kafkalogs.zip > > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208201#comment-15208201 ] Simon Cooper commented on KAFKA-3296: - {code}Topic:__consumer_offsetsPartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=uncompressed Topic: __consumer_offsets Partition: 0Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 1Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 2Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 3Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 4Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 5Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 6Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 7Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 8Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 9Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 10 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 11 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 12 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 13 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 14 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 15 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 16 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 17 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 18 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 19 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 20 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 21 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 22 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 23 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 24 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 25 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 26 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 27 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 28 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 29 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 30 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 31 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 32 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 33 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 34 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 35 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 36 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 37 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 38 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 39 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 40 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 41 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 42 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 43 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 44 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 45 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 46 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 47 Leader: none Replicas: 1 Isr: Topic: __consumer_offsets Partition: 48 Leader: none
[jira] [Comment Edited] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187284#comment-15187284 ] Simon Cooper edited comment on KAFKA-3296 at 3/9/16 4:07 PM: - Are the logs useful? Is there any more information we can provide? We've also started seeing issues where clients cant get metadata from the broker until both zookeeper and kafka are restarted... was (Author: thecoop1984): Are the logs useful? We've also started seeing a (possibly unrelated) issue - a new kafka install doesn't work, putting this in the logs: {code}2016-03-09T15:43:22,998 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:delete cxid:0x93 zxid:0x579 txntype:-1 reqpath:n/a Error Path:/kafkasite1/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /kafkasite1/admin/preferred_replica_election 2016-03-09T15:43:23,142 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0x9a zxid:0x57a txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers Error:KeeperErrorCode = NodeExists for /kafkasite1/brokers 2016-03-09T15:43:23,142 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0x9b zxid:0x57b txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/ids Error:KeeperErrorCode = NodeExists for /kafkasite1/brokers/ids 2016-03-09T15:43:23,309 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa2 zxid:0x57d txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/19 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/19 2016-03-09T15:43:23,313 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa3 zxid:0x57e txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions 2016-03-09T15:43:23,325 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa7 zxid:0x582 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/30 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/30 2016-03-09T15:43:23,341 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xaa zxid:0x585 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/47 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/47 2016-03-09T15:43:23,349 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xad zxid:0x588 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/29 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/29 2016-03-09T15:43:23,357 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xb1 zxid:0x58b txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/41 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/41 2016-03-09T15:43:23,367 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xba zxid:0x58e txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/39 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/39{code} Any client connection can't get topic metadata until both zookeeper and kafka are restarted > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: controller.zip, kafkalogs.zip > > > We've got several
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187284#comment-15187284 ] Simon Cooper commented on KAFKA-3296: - Are the logs useful? We've also started seeing a (possibly unrelated) issue - a new kafka install doesn't work, putting this in the logs: {code}2016-03-09T15:43:22,998 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:delete cxid:0x93 zxid:0x579 txntype:-1 reqpath:n/a Error Path:/kafkasite1/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /kafkasite1/admin/preferred_replica_election 2016-03-09T15:43:23,142 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0x9a zxid:0x57a txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers Error:KeeperErrorCode = NodeExists for /kafkasite1/brokers 2016-03-09T15:43:23,142 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0x9b zxid:0x57b txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/ids Error:KeeperErrorCode = NodeExists for /kafkasite1/brokers/ids 2016-03-09T15:43:23,309 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa2 zxid:0x57d txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/19 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/19 2016-03-09T15:43:23,313 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa3 zxid:0x57e txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions 2016-03-09T15:43:23,325 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xa7 zxid:0x582 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/30 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/30 2016-03-09T15:43:23,341 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xaa zxid:0x585 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/47 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/47 2016-03-09T15:43:23,349 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xad zxid:0x588 txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/29 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/29 2016-03-09T15:43:23,357 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xb1 zxid:0x58b txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/41 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/41 2016-03-09T15:43:23,367 | INFO | o.a.z.s.PrepRequestProcessor [ProcessThread(sid:0 cport:-1):] | Got user-level KeeperException when processing sessionid:0x1535bf80f4e004a type:create cxid:0xba zxid:0x58e txntype:-1 reqpath:n/a Error Path:/kafkasite1/brokers/topics/__consumer_offsets/partitions/39 Error:KeeperErrorCode = NoNode for /kafkasite1/brokers/topics/__consumer_offsets/partitions/39{code} Any client connection can't get topic metadata until both zookeeper and kafka are restarted > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: controller.zip, kafkalogs.zip > > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: >
[jira] [Updated] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Cooper updated KAFKA-3296: Attachment: controller.zip Attached debug log kafka controller logs. Also attached the zookeeper daemon logs for the same period > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: controller.zip, kafkalogs.zip > > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb, > >
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173539#comment-15173539 ] Simon Cooper commented on KAFKA-3296: - It's definitely a single broker, and a standard single-node zookeeper setup. Occasionally we do see warnings from zookeeper in the logs when everything is setup - something along the lines of 'fsync took 12000ms, this may adversely affect performance' - if ZK takes a long time to send updates/watch notifications, could this cause the controller to think it isn't the leader? We'll turn on controller debug logging for the next time this happens > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: kafkalogs.zip > > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata
[jira] [Updated] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Cooper updated KAFKA-3296: Attachment: kafkalogs.zip These are debug logs from the broker when it's in this broken state > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > Attachments: kafkalogs.zip > > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489538056,
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171739#comment-15171739 ] Simon Cooper commented on KAFKA-3296: - This is done right after kafka is installed on the VM - we install the binaries, start up the broker, and create some topics. From that point, consumers don't work. We've put debug logging on the server, so the next time this happens, we'll have broker debug logs. > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057, disconnected=false, > request=ClientRequest(expectResponse=true, >
[jira] [Comment Edited] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169007#comment-15169007 ] Simon Cooper edited comment on KAFKA-3296 at 2/26/16 1:26 PM: -- Example code that breaks that gets the start and end offset of the topic: {code}Consumer c = ... TopicPartition tp = ... c.assign(ImmutableList.of(tp)); ... c.seekToEnd(tp); long endOffset = c.position(tp); c.seekToBeginning(tp); long startOffset = c.position(tp);{code} Although we also observe this behaviour with other uses of Consumer in our codebase. It looks like any call to position() or poll() hangs. We also get the same hang when using kafka-console-consumer (no messages are received when using the new consumer, old consumer works fine) was (Author: thecoop1984): Example code that breaks that gets the start and end offset of the topic: {code}Consumer c = ... TopicPartition tp = ... c.assign(ImmutableList.of(tp)); ... c.seekToEnd(tp); long endOffset = _c.position(tp); c.seekToBeginning(tp); long startOffset = _c.position(tp);{code} Although we also observe this behaviour with other uses of Consumer in our codebase. It looks like any access of position() or poll() hangs. > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, >
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169007#comment-15169007 ] Simon Cooper commented on KAFKA-3296: - Example code that breaks that gets the start and end offset of the topic: {code}Consumer c = ... TopicPartition tp = ... c.assign(ImmutableList.of(tp)); ... c.seekToEnd(tp); long endOffset = _c.position(tp); c.seekToBeginning(tp); long startOffset = _c.position(tp);{code} Although we also observe this behaviour with other uses of Consumer in our codebase. It looks like any access of position() or poll() hangs. > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057,
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169000#comment-15169000 ] Simon Cooper commented on KAFKA-3296: - Looks like this breaks for every topic on the broker (only tested with single-broker clusters though) > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1}, > body={group_id=}),
[jira] [Comment Edited] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168952#comment-15168952 ] Simon Cooper edited comment on KAFKA-3296 at 2/26/16 1:07 PM: -- We've observed the same behaviour with a 0.9.0.0 consumer & 0.9.0.1 broker, and 0.9.0.1 consumer and 0.9.0.1 broker. The consumer was using manual partition assignment to a single-partition topic (no cgroups or anything like that), and the hang was on a position() call. This only happens very occasionally, on the order of several days between triggers, when starting >10 VMs a day. was (Author: thecoop1984): We've observed the same behaviour with a 0.9.0.0 consumer & 0.9.0.1 broker, and 0.9.0.1 consumer and 0.9.0.1 broker. The consumer was using manual partition assignment to a single-partition topic (no cgroups or anything like that), and the hang was on a position() call. This only happens very occasionally, on the order of several days between triggers, when starting up >10 VMs a day. > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, >
[jira] [Commented] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168952#comment-15168952 ] Simon Cooper commented on KAFKA-3296: - We've observed the same behaviour with a 0.9.0.0 consumer & 0.9.0.1 broker, and 0.9.0.1 consumer and 0.9.0.1 broker. The consumer was using manual partition assignment to a single-partition topic (no cgroups or anything like that), and the hang was on a position() call. This only happens very occasionally, on the order of several days between triggers, when starting up >10 VMs a day. > All consumer reads hang indefinately > > > Key: KAFKA-3296 > URL: https://issues.apache.org/jira/browse/KAFKA-3296 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Simon Cooper >Priority: Critical > > We've got several integration tests that bring up systems on VMs for testing. > We've recently upgraded to 0.9, and very occasionally we occasionally see an > issue where every consumer that tries to read from the broker hangs, spamming > the following in their logs: > {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient > [pool-10-thread-1] | Sending metadata request > ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537856, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537857, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489537956, sendTimeMs=0) to node 1 > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489537957, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, > body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), > responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | > Sending metadata request ClientRequest(expectResponse=true, callback=null, > request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, > body={topics=[Topic1]}), isInitiatedByNetworkClient, > createdTimeMs=1456489538056, sendTimeMs=0) to node 1 > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | > Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, > server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = > 0, leader = 1, replicas = [1,], isr = [1,]]) > 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Issuing group metadata request to broker 1 > 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator > [pool-10-thread-1] | Group metadata response > ClientResponse(receivedTimeMs=1456489538057, disconnected=false, >
[jira] [Updated] (KAFKA-3296) All consumer reads hang indefinately
[ https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Cooper updated KAFKA-3296: Description: We've got several integration tests that bring up systems on VMs for testing. We've recently upgraded to 0.9, and very occasionally we occasionally see an issue where every consumer that tries to read from the broker hangs, spamming the following in their logs: {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489537856, sendTimeMs=0) to node 1 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489537857, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489537956, sendTimeMs=0) to node 1 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489537957, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489538056, sendTimeMs=0) to node 1 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489538057, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}){code} This persists for any 0.9 consumer trying to read from the topic (we haven't confirmed if this is for a single topic or for any topic on the broker). 0.8 consumers can read from the broker without issues. This is fixed by a broker restart. This was observed on a single-broker cluster. There were no suspicious log messages on the server. was: We've got several integration tests that bring up systems on VMs for testing. We've recently upgraded to 0.9, and very occasionally we
[jira] [Created] (KAFKA-3296) All consumer reads hang indefinately
Simon Cooper created KAFKA-3296: --- Summary: All consumer reads hang indefinately Key: KAFKA-3296 URL: https://issues.apache.org/jira/browse/KAFKA-3296 Project: Kafka Issue Type: Bug Affects Versions: 0.9.0.1, 0.9.0.0 Reporter: Simon Cooper Priority: Critical We've got several integration tests that bring up systems on VMs for testing. We've recently upgraded to 0.9, and very occasionally we occasionally see an issue where every consumer that tries to read from the broker hangs, spamming the following in their logs: {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489537856, sendTimeMs=0) to node 1 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489537857, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489537956, sendTimeMs=0) to node 1 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489537957, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1}, body={topics=[Topic1]}), isInitiatedByNetworkClient, createdTimeMs=1456489538056, sendTimeMs=0) to node 1 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Issuing group metadata request to broker 1 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator [pool-10-thread-1] | Group metadata response ClientResponse(receivedTimeMs=1456489538057, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb, request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1}, body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}){code} This persists for any 0.9 consumer trying to read from the topic (we haven't confirmed if this is for a single topic or for any topic on the broker). 0.8 consumers can read from the broker without issues. This is fixed by a broker restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2904) Consumer Fails to Reconnect after 30s post restarts
[ https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134158#comment-15134158 ] Simon Cooper commented on KAFKA-2904: - We've seen a similar issue when first starting up brokers - consumers don't work for ~30s after startup, but it is sometimes as much as several minutes before consumers work! We're not using consumer groups at all, we're handling partition assignment & offset management manually. > Consumer Fails to Reconnect after 30s post restarts > --- > > Key: KAFKA-2904 > URL: https://issues.apache.org/jira/browse/KAFKA-2904 > Project: Kafka > Issue Type: Bug >Reporter: Ben Stopford >Assignee: Ben Stopford > Attachments: 2015-11-27--001 (1).tar.gz > > > This problem occurs in around 1 in 20 executions of the security rolling > upgrade test. > Test scenario is a rolling upgrade where each of the three servers are > restarted in turn whilst producer and consumers run. A ten second sleep > between start and stop of each node has been added to ensure there is time > for failover to occur (re KAFKA-2827). > Failure results in no consumed messages after the failure point. > Periodically the consumer does not reconnect for its 30s timeout. The > consumer’s log at this point is at the bottom of this jira. > ISR's appear normal at the time of the failure. > The producer is able to produce throughout this period. > *TIMELINE:* > {quote} > 20:39:23 - Test starts Consumer and Producer > 20:39:27 - Consumer starts consuming produced messages > 20:39:30 - Node 1 shutdown complete > 20:39:45 - Node 1 restarts > 20:39:59 - Node 2 shutdown complete > 20:40:14 - Node 2 restarts > 20:40:27 - Consumer stops consuming > 20:40:28 - Node 2 becomes controller > 20:40:28 - Node 3 shutdown complete > 20:40:34 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:40:42 - Node 3 restarts > *20:41:03 - Consumer times out* > 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group... > 20:41:03 - GroupCoordinator 2: Assignment received from leader for group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed > 20:41:53 - Producer shuts down > {quote} > Consumer log at time of failure: > {quote} > [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 > (kafka.tools.ConsoleConsumer$) > [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be > completed due to group rebalance > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] WARN Auto offset commit failed: > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,329] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,347] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining > the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, > will find new coordinator and rejoin > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:28,097] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to obsolete coordinator > information, retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:41:03,704] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > kafka.consumer.ConsumerTimeoutException > at
[jira] [Comment Edited] (KAFKA-2904) Consumer Fails to Reconnect after 30s post restarts
[ https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134158#comment-15134158 ] Simon Cooper edited comment on KAFKA-2904 at 2/5/16 1:24 PM: - We've seen a similar issue when first starting up brokers - consumers don't work for ~30s after broker startup, but it is sometimes as much as several minutes before consumers work! We're not using consumer groups at all, we're handling partition assignment & offset management manually. was (Author: thecoop1984): We've seen a similar issue when first starting up brokers - consumers don't work for ~30s after startup, but it is sometimes as much as several minutes before consumers work! We're not using consumer groups at all, we're handling partition assignment & offset management manually. > Consumer Fails to Reconnect after 30s post restarts > --- > > Key: KAFKA-2904 > URL: https://issues.apache.org/jira/browse/KAFKA-2904 > Project: Kafka > Issue Type: Bug >Reporter: Ben Stopford >Assignee: Ben Stopford > Attachments: 2015-11-27--001 (1).tar.gz > > > This problem occurs in around 1 in 20 executions of the security rolling > upgrade test. > Test scenario is a rolling upgrade where each of the three servers are > restarted in turn whilst producer and consumers run. A ten second sleep > between start and stop of each node has been added to ensure there is time > for failover to occur (re KAFKA-2827). > Failure results in no consumed messages after the failure point. > Periodically the consumer does not reconnect for its 30s timeout. The > consumer’s log at this point is at the bottom of this jira. > ISR's appear normal at the time of the failure. > The producer is able to produce throughout this period. > *TIMELINE:* > {quote} > 20:39:23 - Test starts Consumer and Producer > 20:39:27 - Consumer starts consuming produced messages > 20:39:30 - Node 1 shutdown complete > 20:39:45 - Node 1 restarts > 20:39:59 - Node 2 shutdown complete > 20:40:14 - Node 2 restarts > 20:40:27 - Consumer stops consuming > 20:40:28 - Node 2 becomes controller > 20:40:28 - Node 3 shutdown complete > 20:40:34 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:40:42 - Node 3 restarts > *20:41:03 - Consumer times out* > 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group... > 20:41:03 - GroupCoordinator 2: Assignment received from leader for group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed > 20:41:53 - Producer shuts down > {quote} > Consumer log at time of failure: > {quote} > [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 > (kafka.tools.ConsoleConsumer$) > [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be > completed due to group rebalance > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] WARN Auto offset commit failed: > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,329] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,347] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining > the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, > will find new coordinator and rejoin > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:28,097] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Attempt to join group >
[jira] [Commented] (KAFKA-1150) Fetch on a replicated topic does not return as soon as possible
[ https://issues.apache.org/jira/browse/KAFKA-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073025#comment-14073025 ] Simon Cooper commented on KAFKA-1150: - Yes, I built a replicated system with the patched kafka, and ran the test for about 10 minutes, with no delayed polls. Fetch on a replicated topic does not return as soon as possible --- Key: KAFKA-1150 URL: https://issues.apache.org/jira/browse/KAFKA-1150 Project: Kafka Issue Type: Bug Components: core, replication Affects Versions: 0.8.0 Reporter: Andrey Balmin Assignee: Neha Narkhede Attachments: Test.java I see a huge performance difference between replicated and not replicated topics. On my laptop, running two brokers, I see producer-2-consumer latency of under 1ms for topics with one replica. However, with two replicas the same latency equals to the max fetch delay. Here is a simple test I just did: one producer thread in a loop sending one message and sleeping for 2500ms, and one consumer thread looping on the long poll with max fetch delay of 1000 ms. Here is what happens with no replication: Produced 1 key: key1 at time: 15:33:52.822 Consumed up to 1 at time: 15:33:52.822 Consumed up to 1 at time: 15:33:53.823 Consumed up to 1 at time: 15:33:54.825 Produced 2 key: key2 at time: 15:33:55.324 Consumed up to 2 at time: 15:33:55.324 Consumed up to 2 at time: 15:33:56.326 Consumed up to 2 at time: 15:33:57.328 Produced 3 key: key3 at time: 15:33:57.827 Consumed up to 3 at time: 15:33:57.827 The are no delays between the message being produced and consumed -- this is the behavior I expected. Here is the same test, but for a topic with two replicas: Consumed up to 0 at time: 15:50:29.575 Produced 1 key: key1 at time: 15:50:29.575 Consumed up to 1 at time: 15:50:30.577 Consumed up to 1 at time: 15:50:31.579 Consumed up to 1 at time: 15:50:32.078 Produced 2 key: key2 at time: 15:50:32.078 Consumed up to 2 at time: 15:50:33.081 Consumed up to 2 at time: 15:50:34.081 Consumed up to 2 at time: 15:50:34.581 Produced 3 key: key3 at time: 15:50:34.581 Consumed up to 3 at time: 15:50:35.584 Notice how the fetch always returns as soon as the produce request is issued, but without the new message, which consistently arrives ~1002 ms later. Below is the request log snippet for this part: Produced 2 key: key2 at time: 15:50:32.078 Consumed up to 2 at time: 15:50:33.081 You can see the first FetchRequest returns at the same time as the replica FetchRequest, but this fetch response is *empty* -- the message is not committed yet, so it cannot be returned. The message is committed at 15:50:32,079. However, the next FetchRequest (that does return the message) comes in at 15:50:32,078, but completes only at 15:50:33,081. Why is it waiting for the full 1000 ms, instead of returning right away? [2013-11-25 15:50:32,077] TRACE Processor 1 received request : Name: ProducerRequest; Version: 0; CorrelationId: 5; ClientId: pro; RequiredAcks: 1; AckTimeoutMs: 20 ms; TopicAndPartition: [test_topic,0] - 2078 (kafka.network.RequestChannel$) [2013-11-25 15:50:32,078] TRACE Completed request:Name: FetchRequest; Version: 0; CorrelationId: 7; ClientId: con; ReplicaId: -1; MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1024000) from client /0:0:0:0:0:0:0:1%0:63264;totalTime:499,queueTime:0,localTime:0,remoteTime:499,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,078] TRACE Completed request:Name: FetchRequest; Version: 0; CorrelationId: 3463; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1048576) from client /127.0.0.1:63056;totalTime:499,queueTime:1,localTime:0,remoteTime:498,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,078] TRACE Processor 1 received request : Name: FetchRequest; Version: 0; CorrelationId: 8; ClientId: con; ReplicaId: -1; MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1024000) (kafka.network.RequestChannel$) [2013-11-25 15:50:32,078] TRACE Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 5; ClientId: pro; RequiredAcks: 1; AckTimeoutMs: 20 ms; TopicAndPartition: [test_topic,0] - 2078 from client /0:0:0:0:0:0:0:1%0:63266;totalTime:1,queueTime:0,localTime:1,remoteTime:0,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,079] TRACE Processor 0 received request : Name: FetchRequest; Version: 0; CorrelationId: 3464; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] -
[jira] [Commented] (KAFKA-1150) Fetch on a replicated topic does not return as soon as possible
[ https://issues.apache.org/jira/browse/KAFKA-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071919#comment-14071919 ] Simon Cooper commented on KAFKA-1150: - Yes, looks like that patch fixes the issue. Fetch on a replicated topic does not return as soon as possible --- Key: KAFKA-1150 URL: https://issues.apache.org/jira/browse/KAFKA-1150 Project: Kafka Issue Type: Bug Components: core, replication Affects Versions: 0.8.0 Reporter: Andrey Balmin Assignee: Neha Narkhede Attachments: Test.java I see a huge performance difference between replicated and not replicated topics. On my laptop, running two brokers, I see producer-2-consumer latency of under 1ms for topics with one replica. However, with two replicas the same latency equals to the max fetch delay. Here is a simple test I just did: one producer thread in a loop sending one message and sleeping for 2500ms, and one consumer thread looping on the long poll with max fetch delay of 1000 ms. Here is what happens with no replication: Produced 1 key: key1 at time: 15:33:52.822 Consumed up to 1 at time: 15:33:52.822 Consumed up to 1 at time: 15:33:53.823 Consumed up to 1 at time: 15:33:54.825 Produced 2 key: key2 at time: 15:33:55.324 Consumed up to 2 at time: 15:33:55.324 Consumed up to 2 at time: 15:33:56.326 Consumed up to 2 at time: 15:33:57.328 Produced 3 key: key3 at time: 15:33:57.827 Consumed up to 3 at time: 15:33:57.827 The are no delays between the message being produced and consumed -- this is the behavior I expected. Here is the same test, but for a topic with two replicas: Consumed up to 0 at time: 15:50:29.575 Produced 1 key: key1 at time: 15:50:29.575 Consumed up to 1 at time: 15:50:30.577 Consumed up to 1 at time: 15:50:31.579 Consumed up to 1 at time: 15:50:32.078 Produced 2 key: key2 at time: 15:50:32.078 Consumed up to 2 at time: 15:50:33.081 Consumed up to 2 at time: 15:50:34.081 Consumed up to 2 at time: 15:50:34.581 Produced 3 key: key3 at time: 15:50:34.581 Consumed up to 3 at time: 15:50:35.584 Notice how the fetch always returns as soon as the produce request is issued, but without the new message, which consistently arrives ~1002 ms later. Below is the request log snippet for this part: Produced 2 key: key2 at time: 15:50:32.078 Consumed up to 2 at time: 15:50:33.081 You can see the first FetchRequest returns at the same time as the replica FetchRequest, but this fetch response is *empty* -- the message is not committed yet, so it cannot be returned. The message is committed at 15:50:32,079. However, the next FetchRequest (that does return the message) comes in at 15:50:32,078, but completes only at 15:50:33,081. Why is it waiting for the full 1000 ms, instead of returning right away? [2013-11-25 15:50:32,077] TRACE Processor 1 received request : Name: ProducerRequest; Version: 0; CorrelationId: 5; ClientId: pro; RequiredAcks: 1; AckTimeoutMs: 20 ms; TopicAndPartition: [test_topic,0] - 2078 (kafka.network.RequestChannel$) [2013-11-25 15:50:32,078] TRACE Completed request:Name: FetchRequest; Version: 0; CorrelationId: 7; ClientId: con; ReplicaId: -1; MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1024000) from client /0:0:0:0:0:0:0:1%0:63264;totalTime:499,queueTime:0,localTime:0,remoteTime:499,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,078] TRACE Completed request:Name: FetchRequest; Version: 0; CorrelationId: 3463; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1048576) from client /127.0.0.1:63056;totalTime:499,queueTime:1,localTime:0,remoteTime:498,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,078] TRACE Processor 1 received request : Name: FetchRequest; Version: 0; CorrelationId: 8; ClientId: con; ReplicaId: -1; MaxWait: 1000 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(129,1024000) (kafka.network.RequestChannel$) [2013-11-25 15:50:32,078] TRACE Completed request:Name: ProducerRequest; Version: 0; CorrelationId: 5; ClientId: pro; RequiredAcks: 1; AckTimeoutMs: 20 ms; TopicAndPartition: [test_topic,0] - 2078 from client /0:0:0:0:0:0:0:1%0:63266;totalTime:1,queueTime:0,localTime:1,remoteTime:0,sendTime:0 (kafka.request.logger) [2013-11-25 15:50:32,079] TRACE Processor 0 received request : Name: FetchRequest; Version: 0; CorrelationId: 3464; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test_topic,0] - PartitionFetchInfo(130,1048576) (kafka.network.RequestChannel$) [2013-11-25 15:50:32,581] TRACE
[jira] [Commented] (KAFKA-1352) Reduce logging on the server
[ https://issues.apache.org/jira/browse/KAFKA-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997719#comment-13997719 ] Simon Cooper commented on KAFKA-1352: - Another example is when there's an exception in the ReplicaFetcherThread: {code} [2014-05-14 12:33:37,811] WARN [ReplicaFetcherThread-0-2], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 36409; ClientId: ReplicaFetcherThread-0-2; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: queues and PartitionFetchInfos java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:465) at sun.nio.ch.Net.connect(Net.java:457) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44) at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) [2014-05-14 12:33:37,811] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) {code} In this case, this error is printed out to the console every *millisecond*, causing the daemon log file to become *very* large *very* quickly. Reduce logging on the server Key: KAFKA-1352 URL: https://issues.apache.org/jira/browse/KAFKA-1352 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.8.1 Reporter: Neha Narkhede Assignee: Ivan Lyutov Labels: newbie, usability Fix For: 0.8.2 Attachments: KAFKA-1352.patch, KAFKA-1352.patch, KAFKA-1352_2014-04-04_21:20:31.patch We have excessive logging in the server, making the logs unreadable and also affecting the performance of the server in practice. We need to clean the logs to address these issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1182) Topic not created if number of live brokers less than # replicas
[ https://issues.apache.org/jira/browse/KAFKA-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997445#comment-13997445 ] Simon Cooper commented on KAFKA-1182: - The current behaviour is causing problems for us - we've got an automated install script that creates several topics, and when creating lots of topics in sequence kafka has a tendancy to bounce replicas, causing spurious failures in the install script. Ideally, topic creation of an under-replicated topic (both ad-hoc and using the kafka-topics.sh script) could be forced, so an automated install could complete, allowing under-replicated topics to be dealt with afterwards Topic not created if number of live brokers less than # replicas Key: KAFKA-1182 URL: https://issues.apache.org/jira/browse/KAFKA-1182 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 0.8.0 Environment: Centos 6.3 Reporter: Hanish Bansal Assignee: Jun Rao We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version) Replication Factor: 2 Number of partitions: 2 Actual Behaviour: Out of two nodes, if any one node goes down then topic is not created in kafka. Steps to Reproduce: 1. Create a 2 node kafka cluster with replication factor 2 2. Start the Kafka cluster 3. Kill any one node 4. Start the producer to write on a new topic 5. Observe the exception stated below: 2013-12-12 19:37:19 0 [WARN ] ClientUtils$ - Fetching topic metadata with correlation id 3 for topics [Set(test-topic)] from broker [id:0,host:122.98.12.11,port:9092] failed java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.producer.SyncProducer.connect(SyncProducer.scala:146) at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68) at kafka.producer.SyncProducer.send(SyncProducer.scala:112) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.BrokerPartitionInfo.getBrokerPartitionInfo(BrokerPartitionInfo.scala:49) at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$getPartitionListForTopic(DefaultEventHandler.scala:186) at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:150) at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:149) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.producer.async.DefaultEventHandler.partitionAndCollate(DefaultEventHandler.scala:149) at kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:95) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72) at kafka.producer.Producer.send(Producer.scala:76) at kafka.javaapi.producer.Producer.send(Producer.scala:33) Expected Behaviour: In case of live brokers less than # replicas: There should be topic created so at least live brokers can receive the data. They can replicate data to other broker once any down broker comes up. Because now in case of live brokers less than # replicas, there is complete loss of data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (KAFKA-1444) kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the toString method
Simon Cooper created KAFKA-1444: --- Summary: kafka.javaapi.TopicMetadata and PartitionMetadata doesn't forward the toString method Key: KAFKA-1444 URL: https://issues.apache.org/jira/browse/KAFKA-1444 Project: Kafka Issue Type: Bug Reporter: Simon Cooper Priority: Minor the kafka.javaapi.PartitionMetadata and TopicMetadata classes don't forward the toString method to the underlying kafka.api.PartitionMetadata/TopicMetadata classes along with the other methods. This means toString on these classes doesn't work properly. -- This message was sent by Atlassian JIRA (v6.2#6252)