Oliver Deakin created KAFKA-4446:
------------------------------------

             Summary: If consumer offset topic created with less replicas than 
min.insync.replicas, consuming is not possible
                 Key: KAFKA-4446
                 URL: https://issues.apache.org/jira/browse/KAFKA-4446
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.1.0
         Environment: Ubuntu 16.04
            Reporter: Oliver Deakin


This is a bit of an edge case but it has a high impact. I have seen an issue 
multiple times while creating a new cluster of Kafka brokers and consuming 
components in an automated deployment. Full details of the chain of events are 
given below. I expect this could also occur if the first consume to a Kafka 
cluster happens while some nodes are in a failure state.

It appears that while the consumer offsets topic could be created with a 
replication factor of only 1 or 2 (if only 1 Kafka broker is alive when it's 
created), the min.insync.replicas is still applied and if that's higher than 
the replication factor it becomes impossible to consume any messages. It seems 
that when a topic is created explicitly with a replication factor less than 
min.insync.replicas, that rule should not be applied as it makes the topic 
unusable.

Detailed scenario:
 - Kafka is utilised as an event messaging pipeline around which a number of 
components are deployed that produce and consume messages.
 - Deployments of a new environment bring up all components, including a 3 node 
Kafka cluster and some event-driven components at the same time.
 - Our configuration sets min.insync.replicas=2.
 - Kafka node 1 opens its listener port before the other two brokers come up
 - one of the components subscribes to a topic and attempts to consume from a 
pre-created topic for the first time, also before the other two Kafka brokers 
come up
 - Kafka node 1 creates the consumer offsets topic with replication factor 1, 
as it is the only live broker. This is expected behaviour as per the 
documentation for offsets.topic.replication.factor.
 - Kafka node 1 fails with a repeating error message and never recovers when 
attempting to send a consumer offset message to the topic as there is only 1 
member of the ISR but min.insync.replicas is 2. The repeating error message is:
kafka2_1  | org.apache.kafka.common.errors.NotEnoughReplicasException: Number 
of insync replicas for partition [__consumer_offsets,31] is [1], below required 
minimum [2]
 - No consumers can consume from this cluster any more.



(FYI 0.10.1.0 is still listed as unreleased in JIRA, but the project front page 
says it's the latest release)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to