[ 
https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Cooper updated KAFKA-3296:
--------------------------------
    Description: 
We've got several integration tests that bring up systems on VMs for testing. 
We've recently upgraded to 0.9, and very occasionally we occasionally see an 
issue where every consumer that tries to read from the broker hangs, spamming 
the following in their logs:

{code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient 
[pool-10-thread-1] | Sending metadata request 
ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489537856, sendTimeMs=0) to node 1
2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10954 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489537857, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
Sending metadata request ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489537956, sendTimeMs=0) to node 1
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10955 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489537957, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
Sending metadata request ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489538056, sendTimeMs=0) to node 1
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10956 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489538057, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}){code}

This persists for any 0.9 consumer trying to read from the topic (we haven't 
confirmed if this is for a single topic or for any topic on the broker). 0.8 
consumers can read from the broker without issues. This is fixed by a broker 
restart.

This was observed on a single-broker cluster. There were no suspicious log 
messages on the server.

  was:
We've got several integration tests that bring up systems on VMs for testing. 
We've recently upgraded to 0.9, and very occasionally we occasionally see an 
issue where every consumer that tries to read from the broker hangs, spamming 
the following in their logs:

{code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient 
[pool-10-thread-1] | Sending metadata request 
ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489537856, sendTimeMs=0) to node 1
2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10954 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489537857, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
Sending metadata request ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489537956, sendTimeMs=0) to node 1
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10955 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489537957, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
Sending metadata request ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1},
 body={topics=[Topic1]}), isInitiatedByNetworkClient, 
createdTimeMs=1456489538056, sendTimeMs=0) to node 1
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | Updated 
cluster metadata version 10956 to Cluster(nodes = [Node(1, server.internal, 
9092)], partitions = [Partition(topic = Topic1, partition = 0, leader = 1, 
replicas = [1,], isr = [1,]])
2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Issuing group metadata request to broker 1
2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
[pool-10-thread-1] | Group metadata response 
ClientResponse(receivedTimeMs=1456489538057, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1},
 body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}){code}

This persists for any 0.9 consumer trying to read from the topic (we haven't 
confirmed if this is for a single topic or for any topic on the broker). 0.8 
consumers can read from the broker without issues. This is fixed by a broker 
restart.


> All consumer reads hang indefinately
> ------------------------------------
>
>                 Key: KAFKA-3296
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3296
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.0, 0.9.0.1
>            Reporter: Simon Cooper
>            Priority: Critical
>
> We've got several integration tests that bring up systems on VMs for testing. 
> We've recently upgraded to 0.9, and very occasionally we occasionally see an 
> issue where every consumer that tries to read from the broker hangs, spamming 
> the following in their logs:
> {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient 
> [pool-10-thread-1] | Sending metadata request 
> ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489537856, sendTimeMs=0) to node 1
> 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489537857, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), 
> responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
> Sending metadata request ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489537956, sendTimeMs=0) to node 1
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489537957, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), 
> responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
> Sending metadata request ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489538056, sendTimeMs=0) to node 1
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489538057, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), 
> responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}){code}
> This persists for any 0.9 consumer trying to read from the topic (we haven't 
> confirmed if this is for a single topic or for any topic on the broker). 0.8 
> consumers can read from the broker without issues. This is fixed by a broker 
> restart.
> This was observed on a single-broker cluster. There were no suspicious log 
> messages on the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to