[ 
https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-9374.
----------------------------------
      Reviewer: Konstantine Karantasis
    Resolution: Fixed

Merged to `trunk` and backported to the `2.6` branch for inclusion in 2.6.0.

> Worker can be disabled by blocked connectors
> --------------------------------------------
>
>                 Key: KAFKA-9374
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9374
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 
> 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>             Fix For: 2.6.0
>
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
> \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
> methods, the worker will be disabled for some types of requests thereafter, 
> including connector creation, connector reconfiguration, and connector 
> deletion.
>  -This only occurs in distributed mode and is due to the threading model used 
> by the 
> [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
>  class.- This affects both distributed and standalone mode. Distributed 
> herders perform some connector work synchronously in their {{tick}} thread, 
> which also handles group membership and some REST requests. The majority of 
> the herder methods for the standalone herder are {{synchronized}}, including 
> those for creating, updating, and deleting connectors; as long as one of 
> those methods blocks, all subsequent calls to any of these methods will also 
> be blocked.
>  
> One potential solution could be to treat connectors that fail to start, stop, 
> etc. in time similarly to tasks that fail to stop within the [task graceful 
> shutdown timeout 
> period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
>  by handling all connector interactions on a separate thread, waiting for 
> them to complete within a timeout, and abandoning the thread (and 
> transitioning the connector to the {{FAILED}} state, if it has been created 
> at all) if that timeout expires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to