[ https://issues.apache.org/jira/browse/KAFKA-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch resolved KAFKA-7941. ---------------------------------- Resolution: Fixed Reviewer: Randall Hauch Fix Version/s: 2.3.1 2.4.0 2.2.2 2.1.2 2.0.2 Merged to the `trunk`, `2.3`, `2.2`, `2.1`, and `2.0` branches. Thanks, [~pgwhalen]! > Connect KafkaBasedLog work thread terminates when getting offsets fails > because broker is unavailable > ----------------------------------------------------------------------------------------------------- > > Key: KAFKA-7941 > URL: https://issues.apache.org/jira/browse/KAFKA-7941 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Paul Whalen > Assignee: Paul Whalen > Priority: Minor > Fix For: 2.0.2, 2.1.2, 2.2.2, 2.4.0, 2.3.1 > > > My team has run into this Connect bug regularly in the last six months while > doing infrastructure maintenance that causes intermittent broker availability > issues. I'm a little surprised it exists given how routinely it affects us, > so perhaps someone in the know can point out if our setup is somehow just > incorrect. My team is running 2.0.0 on both the broker and client, though > from what I can tell from reading the code, the issue continues to exist > through 2.2; at least, I was able to write a failing unit test that I believe > reproduces it. > When a {{KafkaBasedLog}} worker thread in the Connect runtime calls > {{readLogToEnd}} and brokers are unavailable, the {{TimeoutException}} from > the consumer {{endOffsets}} call is uncaught all the way up to the top level > {{catch (Throwable t)}}, effectively killing the thread until restarting > Connect. The result is Connect stops functioning entirely, with no > indication except for that log line - tasks still show as running. > The proposed fix is to simply catch and log the {{TimeoutException}}, > allowing the worker thread to retry forever. > Alternatively, perhaps there is not an expectation that Connect should be > able to recover following broker unavailability, though that would be > disappointing. I would at least hope hope for a louder failure then the > single {{ERROR}} log. -- This message was sent by Atlassian JIRA (v7.6.14#76016)