[ https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch resolved KAFKA-9374. ---------------------------------- Reviewer: Konstantine Karantasis Resolution: Fixed Merged to `trunk` and backported to the `2.6` branch for inclusion in 2.6.0. > Worker can be disabled by blocked connectors > -------------------------------------------- > > Key: KAFKA-9374 > URL: https://issues.apache.org/jira/browse/KAFKA-9374 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, > 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1 > Reporter: Chris Egerton > Assignee: Chris Egerton > Priority: Major > Fix For: 2.6.0 > > > If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, > \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} > methods, the worker will be disabled for some types of requests thereafter, > including connector creation, connector reconfiguration, and connector > deletion. > -This only occurs in distributed mode and is due to the threading model used > by the > [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java] > class.- This affects both distributed and standalone mode. Distributed > herders perform some connector work synchronously in their {{tick}} thread, > which also handles group membership and some REST requests. The majority of > the herder methods for the standalone herder are {{synchronized}}, including > those for creating, updating, and deleting connectors; as long as one of > those methods blocks, all subsequent calls to any of these methods will also > be blocked. > > One potential solution could be to treat connectors that fail to start, stop, > etc. in time similarly to tasks that fail to stop within the [task graceful > shutdown timeout > period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126] > by handling all connector interactions on a separate thread, waiting for > them to complete within a timeout, and abandoning the thread (and > transitioning the connector to the {{FAILED}} state, if it has been created > at all) if that timeout expires. -- This message was sent by Atlassian Jira (v8.3.4#803005)