[
https://issues.apache.org/jira/browse/KAFKA-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
alex gabriel updated KAFKA-8206:
--------------------------------
Description:
*A consumer can't discover new group coordinator when the cluster was partly
restarted*
Preconditions:
I use Kafka server and Java kafka-client lib 2.2 version
I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1
ZK(localhost:2181)
I have replication factor 2 for the all my topics and
'_unclean.leader.election.enable=true_' on both Kafka nodes.
Steps to reproduce:
1) Start 2nodes (localhost:9092/localhost:9093)
2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
{noformat}
// discovered group coordinator (0-node)
2019-04-09 16:23:18,963 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
...metadatacache is updated (2 nodes in the cluster list)
2019-04-09 16:23:18,928 DEBUG
[org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
[Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending
metadata request (type=MetadataRequest, topics=<ALL>) to node localhost:9092
(id: -1 rack: null)>
2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null),
localhost:9093 (id: 1 rack: null)], partitions = [], controller =
localhost:9092 (id: 0 rack: null))}>
{noformat}
3) Shutdown 1-node (localhost:9093)
{noformat}
// metadata was updated to the 4 version (but for some reasons it still had 2
alive nodes inside cluster)
2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null),
localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic =
events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1],
offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader =
0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller =
localhost:9092 (id: 0 rack: null))}>
//consumers thinks that node-1 is still alive and try to send coordinator
lookup to it but failed
2019-04-09 16:23:46,981 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
2019-04-09 16:23:46,981 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or
invalid, will attempt rediscovery>
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
2019-04-09 16:24:01,117 WARN
[org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1
(localhost:9093) could not be established. Broker may not be available.>
// refreshing metadata again
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Cancelled
request with header RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2,
clientId=events-consumer0, correlationId=112) due to node 1 being disconnected>
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Coordinator discovery failed, refreshing metadata>
// metadata was updated to the 5 version where cluster had only 0-node
localhost:9092 as expected.
2019-04-09 16:24:01,131 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 5 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null)], partitions
= [Partition(topic = events-sorted, partition = 1, leader = 0, replicas =
[0,1], isr = [0], offlineReplicas = [1]), Partition(topic = events-sorted,
partition = 0, leader = 0, replicas = [0,1], isr = [0], offlineReplicas =
[1])], controller = localhost:9092 (id: 0 rack: null))}>
// 0-node discovered as coordinator
2019-04-09 16:24:01,132 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
{noformat}
*At this point consumer stores only information about 0-node(localhost:9092)
inside cluster property of the metadata cache.*
4) Shutdown 0-node (localhost:9092)
5) Start 1-node (localhost:9093)
{noformat}
//consumer tries to re-connect only to the 0-node
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.common.network.Selector.pollSelectionKeys] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Connection with
localhost disconnected>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 0 disconnected.>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 2147483647
disconnected.>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Cancelled
request with header RequestHeader(apiKey=FETCH, apiVersion=10,
clientId=events-consumer0, correlationId=209) due to node 0 being disconnected>
2019-04-09 16:24:40,649 INFO
[org.apache.kafka.clients.FetchSessionHandler.handleError] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Error sending fetch
request (sessionId=1754516055, epoch=132) to node 0:
org.apache.kafka.common.errors.DisconnectException.>
2019-04-09 16:24:40,650 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
coordinator localhost:9092 (id: 2147483647 rack: null) is unavailable or
invalid, will attempt rediscovery>
2019-04-09 16:24:40,650 DEBUG
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.lookupCoordinator]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] No broker
available to send FindCoordinator request>
2019-04-09 16:24:40,650 DEBUG
[org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
[Consumer clientId=events-consumer0, groupId=events-group-gabriel] Give up
sending metadata request since no node is available>
{noformat}
As I see the consumer tries to initialize connection only to the 0-node cause
this is only node inside consumer's cluster metadata cache, thus consumer can't
get a connection to the new group coordinator 1-node and as result can't poll
any messages
To resolve this pending state 0-node must be restored OR consumer must be
re-started(in this case consumer discover a new node from the initial brokers
list)
p.s. The same behavior could be reproduced using a 3 nodes cluster.
_The question: is this by design that consumer stores metadata cache with only
active nodes list?_
Since it leads to a situation when a last active node from the list goes down
and the other node come to life but the consumer doesn't have any info about
that node and trying to re-connect to the last active node ignoring new node.
>From my point of view, a consumer should always store some initial nodes list
>and try to reconnect to the nodes from the initial list in case if there are
>no alive nodes from the metadata cluster cache.
was:
*A consumer can't discover new group coordinator when the cluster was partly
restarted*
Preconditions:
I use Kafka server and Java kafka-client lib 2.2 version
I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1
ZK(localhost:2181/localhost:2181)
I have replication factor 2 for the all my topics and
'_unclean.leader.election.enable=true_' on both Kafka nodes.
Steps to reproduce:
1) Start 2nodes (localhost:9092/localhost:9093)
2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
{noformat}
// discovered group coordinator (0-node)
2019-04-09 16:23:18,963 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
...metadatacache is updated (2 nodes in the cluster list)
2019-04-09 16:23:18,928 DEBUG
[org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
[Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending
metadata request (type=MetadataRequest, topics=<ALL>) to node localhost:9092
(id: -1 rack: null)>
2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null),
localhost:9093 (id: 1 rack: null)], partitions = [], controller =
localhost:9092 (id: 0 rack: null))}>
{noformat}
3) Shutdown 1-node (localhost:9093)
{noformat}
// metadata was updated to the 4 version (but for some reasons it still had 2
alive nodes inside cluster)
2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null),
localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic =
events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1],
offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader =
0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller =
localhost:9092 (id: 0 rack: null))}>
//consumers thinks that node-1 is still alive and try to send coordinator
lookup to it but failed
2019-04-09 16:23:46,981 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
2019-04-09 16:23:46,981 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or
invalid, will attempt rediscovery>
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
2019-04-09 16:24:01,117 WARN
[org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1
(localhost:9093) could not be established. Broker may not be available.>
// refreshing metadata again
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Cancelled
request with header RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2,
clientId=events-consumer0, correlationId=112) due to node 1 being disconnected>
2019-04-09 16:24:01,117 DEBUG
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Coordinator discovery failed, refreshing metadata>
// metadata was updated to the 5 version where cluster had only 0-node
localhost:9092 as expected.
2019-04-09 16:24:01,131 DEBUG [org.apache.kafka.clients.Metadata.update] -
Updated cluster metadata version 5 to MetadataCache{cluster=Cluster(id =
P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null)], partitions
= [Partition(topic = events-sorted, partition = 1, leader = 0, replicas =
[0,1], isr = [0], offlineReplicas = [1]), Partition(topic = events-sorted,
partition = 0, leader = 0, replicas = [0,1], isr = [0], offlineReplicas =
[1])], controller = localhost:9092 (id: 0 rack: null))}>
// 0-node discovered as coordinator
2019-04-09 16:24:01,132 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
{noformat}
*At this point consumer stores only information about 0-node(localhost:9092)
inside cluster property of the metadata cache.*
4) Shutdown 0-node (localhost:9092)
5) Start 1-node (localhost:9093)
{noformat}
//consumer tries to re-connect only to the 0-node
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.common.network.Selector.pollSelectionKeys] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Connection with
localhost disconnected>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 0 disconnected.>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Node 2147483647
disconnected.>
2019-04-09 16:24:40,649 DEBUG
[org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Cancelled
request with header RequestHeader(apiKey=FETCH, apiVersion=10,
clientId=events-consumer0, correlationId=209) due to node 0 being disconnected>
2019-04-09 16:24:40,649 INFO
[org.apache.kafka.clients.FetchSessionHandler.handleError] - [Consumer
clientId=events-consumer0, groupId=events-group-gabriel] Error sending fetch
request (sessionId=1754516055, epoch=132) to node 0:
org.apache.kafka.common.errors.DisconnectException.>
2019-04-09 16:24:40,650 INFO
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
coordinator localhost:9092 (id: 2147483647 rack: null) is unavailable or
invalid, will attempt rediscovery>
2019-04-09 16:24:40,650 DEBUG
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator.lookupCoordinator]
- [Consumer clientId=events-consumer0, groupId=events-group-gabriel] No broker
available to send FindCoordinator request>
2019-04-09 16:24:40,650 DEBUG
[org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
[Consumer clientId=events-consumer0, groupId=events-group-gabriel] Give up
sending metadata request since no node is available>
{noformat}
As I see the consumer tries to initialize connection only to the 0-node cause
this is only node inside consumer's cluster metadata cache, thus consumer can't
get a connection to the new group coordinator 1-node and as result can't poll
any messages
To resolve this pending state 0-node must be restored OR consumer must be
re-started(in this case consumer discover a new node from the initial brokers
list)
p.s. The same behavior could be reproduced using a 3 nodes cluster.
_The question: is this by design that consumer stores metadata cache with only
active nodes list?_
Since it leads to a situation when a last active node from the list goes down
and the other node come to life but the consumer doesn't have any info about
that node and trying to re-connect to the last active node ignoring new node.
>From my point of view, a consumer should always store some initial nodes list
>and try to reconnect to the nodes from the initial list in case if there are
>no alive nodes from the metadata cluster cache.
> A consumer can't discover new group coordinator when the cluster was partly
> restarted
> -------------------------------------------------------------------------------------
>
> Key: KAFKA-8206
> URL: https://issues.apache.org/jira/browse/KAFKA-8206
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 1.0.0, 2.0.0, 2.2.0
> Reporter: alex gabriel
> Priority: Critical
>
> *A consumer can't discover new group coordinator when the cluster was partly
> restarted*
> Preconditions:
> I use Kafka server and Java kafka-client lib 2.2 version
> I have 2 Kafka nodes running localy (localhost:9092, localhost:9093) and 1
> ZK(localhost:2181)
> I have replication factor 2 for the all my topics and
> '_unclean.leader.election.enable=true_' on both Kafka nodes.
> Steps to reproduce:
> 1) Start 2nodes (localhost:9092/localhost:9093)
> 2) Start consumer with 'bootstrap.servers=localhost:9092,localhost:9093'
> {noformat}
> // discovered group coordinator (0-node)
> 2019-04-09 16:23:18,963 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
> ...metadatacache is updated (2 nodes in the cluster list)
> 2019-04-09 16:23:18,928 DEBUG
> [org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
> [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Sending
> metadata request (type=MetadataRequest, topics=<ALL>) to node localhost:9092
> (id: -1 rack: null)>
> 2019-04-09 16:23:18,940 DEBUG [org.apache.kafka.clients.Metadata.update] -
> Updated cluster metadata version 2 to MetadataCache{cluster=Cluster(id =
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null),
> localhost:9093 (id: 1 rack: null)], partitions = [], controller =
> localhost:9092 (id: 0 rack: null))}>
> {noformat}
> 3) Shutdown 1-node (localhost:9093)
> {noformat}
> // metadata was updated to the 4 version (but for some reasons it still had 2
> alive nodes inside cluster)
> 2019-04-09 16:23:46,981 DEBUG [org.apache.kafka.clients.Metadata.update] -
> Updated cluster metadata version 4 to MetadataCache{cluster=Cluster(id =
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9093 (id: 1 rack: null),
> localhost:9092 (id: 0 rack: null)], partitions = [Partition(topic =
> events-sorted, partition = 1, leader = 0, replicas = [0,1], isr = [0,1],
> offlineReplicas = []), Partition(topic = events-sorted, partition = 0, leader
> = 0, replicas = [0,1], isr = [0,1], offlineReplicas = [])], controller =
> localhost:9092 (id: 0 rack: null))}>
> //consumers thinks that node-1 is still alive and try to send coordinator
> lookup to it but failed
> 2019-04-09 16:23:46,981 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Discovered group coordinator localhost:9093 (id: 2147483646 rack: null)>
> 2019-04-09 16:23:46,981 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
> coordinator localhost:9093 (id: 2147483646 rack: null) is unavailable or
> invalid, will attempt rediscovery>
> 2019-04-09 16:24:01,117 DEBUG
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Node 1 disconnected.>
> 2019-04-09 16:24:01,117 WARN
> [org.apache.kafka.clients.NetworkClient.processDisconnection] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Connection to node 1
> (localhost:9093) could not be established. Broker may not be available.>
> // refreshing metadata again
> 2019-04-09 16:24:01,117 DEBUG
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Cancelled request with header RequestHeader(apiKey=FIND_COORDINATOR,
> apiVersion=2, clientId=events-consumer0, correlationId=112) due to node 1
> being disconnected>
> 2019-04-09 16:24:01,117 DEBUG
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Coordinator discovery failed, refreshing metadata>
> // metadata was updated to the 5 version where cluster had only 0-node
> localhost:9092 as expected.
> 2019-04-09 16:24:01,131 DEBUG [org.apache.kafka.clients.Metadata.update] -
> Updated cluster metadata version 5 to MetadataCache{cluster=Cluster(id =
> P3pz1xU0SjK-Dhy6h2G5YA, nodes = [localhost:9092 (id: 0 rack: null)],
> partitions = [Partition(topic = events-sorted, partition = 1, leader = 0,
> replicas = [0,1], isr = [0], offlineReplicas = [1]), Partition(topic =
> events-sorted, partition = 0, leader = 0, replicas = [0,1], isr = [0],
> offlineReplicas = [1])], controller = localhost:9092 (id: 0 rack: null))}>
> // 0-node discovered as coordinator
> 2019-04-09 16:24:01,132 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)>
> {noformat}
> *At this point consumer stores only information about 0-node(localhost:9092)
> inside cluster property of the metadata cache.*
> 4) Shutdown 0-node (localhost:9092)
> 5) Start 1-node (localhost:9093)
> {noformat}
> //consumer tries to re-connect only to the 0-node
> 2019-04-09 16:24:40,649 DEBUG
> [org.apache.kafka.common.network.Selector.pollSelectionKeys] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Connection with
> localhost disconnected>
> 2019-04-09 16:24:40,649 DEBUG
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Node 0 disconnected.>
> 2019-04-09 16:24:40,649 DEBUG
> [org.apache.kafka.clients.NetworkClient.handleDisconnections] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Node 2147483647
> disconnected.>
> 2019-04-09 16:24:40,649 DEBUG
> [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel]
> Cancelled request with header RequestHeader(apiKey=FETCH, apiVersion=10,
> clientId=events-consumer0, correlationId=209) due to node 0 being
> disconnected>
> 2019-04-09 16:24:40,649 INFO
> [org.apache.kafka.clients.FetchSessionHandler.handleError] - [Consumer
> clientId=events-consumer0, groupId=events-group-gabriel] Error sending fetch
> request (sessionId=1754516055, epoch=132) to node 0:
> org.apache.kafka.common.errors.DisconnectException.>
> 2019-04-09 16:24:40,650 INFO
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.markCoordinatorUnknown]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Group
> coordinator localhost:9092 (id: 2147483647 rack: null) is unavailable or
> invalid, will attempt rediscovery>
> 2019-04-09 16:24:40,650 DEBUG
> [org.apache.kafka.clients.consumer.internals.AbstractCoordinator.lookupCoordinator]
> - [Consumer clientId=events-consumer0, groupId=events-group-gabriel] No
> broker available to send FindCoordinator request>
> 2019-04-09 16:24:40,650 DEBUG
> [org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate] -
> [Consumer clientId=events-consumer0, groupId=events-group-gabriel] Give up
> sending metadata request since no node is available>
> {noformat}
> As I see the consumer tries to initialize connection only to the 0-node cause
> this is only node inside consumer's cluster metadata cache, thus consumer
> can't get a connection to the new group coordinator 1-node and as result
> can't poll any messages
> To resolve this pending state 0-node must be restored OR consumer must be
> re-started(in this case consumer discover a new node from the initial brokers
> list)
> p.s. The same behavior could be reproduced using a 3 nodes cluster.
> _The question: is this by design that consumer stores metadata cache with
> only active nodes list?_
> Since it leads to a situation when a last active node from the list goes down
> and the other node come to life but the consumer doesn't have any info about
> that node and trying to re-connect to the last active node ignoring new node.
> From my point of view, a consumer should always store some initial nodes list
> and try to reconnect to the nodes from the initial list in case if there are
> no alive nodes from the metadata cluster cache.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)