Chris Egerton created KAFKA-15059:
-------------------------------------

             Summary: Exactly-once source tasks fail to start during pending 
rebalances
                 Key: KAFKA-15059
                 URL: https://issues.apache.org/jira/browse/KAFKA-15059
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect, mirrormaker
    Affects Versions: 3.3.2, 3.3.1, 3.4.0, 3.3.0, 3.5.0, 3.4.1
            Reporter: Chris Egerton
            Assignee: Chris Egerton


When asked to perform a round of zombie fencing, the distributed herder will 
[reject the 
request|https://github.com/apache/kafka/blob/17fd30e6b457f097f6a524b516eca1a6a74a9144/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1249-L1250]
 if a rebalance is pending, which can happen if (among other things) a config 
for a new connector or a new set of task configs has been recently read from 
the config topic.

Normally this can be alleviated with a simple task restart, which isn't great 
but isn't terrible.

However, when running MirrorMaker 2 in dedicated mode, there is no API to 
restart failed tasks, and it can be more common to see this kind of failure on 
a fresh cluster because three connector configurations are written in rapid 
succession to the config topic.

 

In order to provide a better experience for users of both vanilla Kafka Connect 
and dedicated MirrorMaker 2 clusters, we can retry (likely with the same 
exponential backoff introduced with KAFKA-14732) zombie fencing attempts that 
fail due to a pending rebalance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to