[GitHub] [kafka] C0urante commented on pull request #11588: KAFKA-13485: Restart connectors after RetriableException raised from Task::start()

GitBox Mon, 20 Dec 2021 06:18:15 -0800


C0urante commented on pull request #11588:
URL: https://github.com/apache/kafka/pull/11588#issuecomment-997964564



   @blcksrx did you catch the [mailing list 
thread](https://www.mail-archive.com/dev@kafka.apache.org/msg120391.html) about 
this? I left this comment on there:
   
   > I think there's some risk of introducing this retry behavior if we end up 
invoking Connector::start or Task::start on the same object multiple times. 
Unexpected behavior may result, such as double-allocation of resources that are 
initialized in the start method and which are meant to be released in the stop 
method. An alternative could be to invoke stop on the object to allow it to 
perform best-effort cleanup, then initialize an entirely new Connector or Task 
instance, and invoke its start method.
   
   It's worth keeping in mind that some connectors may throw 
`RetriableException`s from `start` right now but not handle this case properly. 
If we add this behavior now and someone upgrades their worker to a version with 
this change, that kind of connector being restarted in a loop may end up 
crippling their worker.
   
   
   
   On a separate note, with these changes, what would happen if a task were 
stuck in a retry loop, but then scheduled for shutdown (because of rebalance, 
deletion of the connector, reconfiguration, etc.)? If the answer is "the task 
will keep retrying until `start` either fails with a non-retriable error or 
succeeds" then we may want to refine the logic a little bit in order to avoid 
accruing zombie tasks in that situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] C0urante commented on pull request #11588: KAFKA-13485: Restart connectors after RetriableException raised from Task::start()

Reply via email to