[
https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010527#comment-17010527
]
Tom Bentley commented on KAFKA-9374:
------------------------------------
[~ChrisEgerton] yeah, I agree there's no tenable alternative for recovering the
thread, and protecting the worker is a worthy aim. I like the idea of using an
error response (for those cases associated with a request) in addition to
transitioning the connector to failed.
Would the timeout be configurable (not arguing that it should, merely asking)?
> Worker can be disabled by blocked connectors
> --------------------------------------------
>
> Key: KAFKA-9374
> URL: https://issues.apache.org/jira/browse/KAFKA-9374
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0,
> 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}},
> \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}}
> methods, the worker will be disabled for some types of requests thereafter,
> including connector creation, connector reconfiguration, and connector
> deletion.
> -This only occurs in distributed mode and is due to the threading model used
> by the
> [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
> class.- This affects both distributed and standalone mode. Distributed
> herders perform some connector work synchronously in their {{tick}} thread,
> which also handles group membership and some REST requests. The majority of
> the herder methods for the standalone herder are {{synchronized}}, including
> those for creating, updating, and deleting connectors; as long as one of
> those methods blocks, all subsequent calls to any of these methods will also
> be blocked.
>
> One potential solution could be to treat connectors that fail to start, stop,
> etc. in time similarly to tasks that fail to stop within the [task graceful
> shutdown timeout
> period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
> by handling all connector interactions on a separate thread, waiting for
> them to complete within a timeout, and abandoning the thread (and
> transitioning the connector to the {{FAILED}} state, if it has been created
> at all) if that timeout expires.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)