Ryanne Dolan created KAFKA-12726:
------------------------------------
Summary: misbehaving Task.stop() can prevent other Tasks from
stopping
Key: KAFKA-12726
URL: https://issues.apache.org/jira/browse/KAFKA-12726
Project: Kafka
Issue Type: Bug
Components: KafkaConnect
Affects Versions: 2.8.0
Reporter: Ryanne Dolan
We've observed a misbehaving Task fail to stop in a timely manner (e.g. stuck
in a retry loop). Despite Connect supporting a property
task.shutdown.graceful.timeout.ms, this is currently not enforced -- tasks can
take as long as they want to stop, and the only consequence is an error message.
Unfortunately, Workers stop Tasks sequentially, meaning that a stuck Task can
prevent any further Tasks from stopping. Moreover, after a rebalance, these
lingering tasks can persist along with their replacements. For example, we've
seen a Worker's "task-count" metric double following a rebalance.
While the Connector implementation is ultimately to blame here -- a Task
probably shouldn't loop forever in stop() -- we believe the Connect runtime
should handle this situation more gracefully.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)