[
https://issues.apache.org/jira/browse/KAFKA-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Randall Hauch resolved KAFKA-7941.
----------------------------------
Resolution: Fixed
Reviewer: Randall Hauch
Fix Version/s: 2.3.1
2.4.0
2.2.2
2.1.2
2.0.2
Merged to the `trunk`, `2.3`, `2.2`, `2.1`, and `2.0` branches.
Thanks, [~pgwhalen]!
> Connect KafkaBasedLog work thread terminates when getting offsets fails
> because broker is unavailable
> -----------------------------------------------------------------------------------------------------
>
> Key: KAFKA-7941
> URL: https://issues.apache.org/jira/browse/KAFKA-7941
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Paul Whalen
> Assignee: Paul Whalen
> Priority: Minor
> Fix For: 2.0.2, 2.1.2, 2.2.2, 2.4.0, 2.3.1
>
>
> My team has run into this Connect bug regularly in the last six months while
> doing infrastructure maintenance that causes intermittent broker availability
> issues. I'm a little surprised it exists given how routinely it affects us,
> so perhaps someone in the know can point out if our setup is somehow just
> incorrect. My team is running 2.0.0 on both the broker and client, though
> from what I can tell from reading the code, the issue continues to exist
> through 2.2; at least, I was able to write a failing unit test that I believe
> reproduces it.
> When a {{KafkaBasedLog}} worker thread in the Connect runtime calls
> {{readLogToEnd}} and brokers are unavailable, the {{TimeoutException}} from
> the consumer {{endOffsets}} call is uncaught all the way up to the top level
> {{catch (Throwable t)}}, effectively killing the thread until restarting
> Connect. The result is Connect stops functioning entirely, with no
> indication except for that log line - tasks still show as running.
> The proposed fix is to simply catch and log the {{TimeoutException}},
> allowing the worker thread to retry forever.
> Alternatively, perhaps there is not an expectation that Connect should be
> able to recover following broker unavailability, though that would be
> disappointing. I would at least hope hope for a louder failure then the
> single {{ERROR}} log.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)