yashmayya opened a new pull request, #13276:
URL: https://github.com/apache/kafka/pull/13276

   - Kafka Connect in distributed mode currently retries infinitely with a 
fixed retry backoff (`250 ms`) in case of errors arising during connector task 
reconfiguration.
   - Tasks can be "reconfigured" during connector startup (to get the initial 
task configs from the connector), a connector resume or if a connector 
explicitly requests it via its context.
   - Task reconfiguration essentially entails requesting a connector instance 
for its task configs and writing them to the Connect cluster's config storage 
(in case a change in task configs is detected).
   - A fixed retry backoff of 250 ms leads to very aggressive retries - 
consider a Debezium connector which attempts to initiate a database connection 
in its [taskConfigs 
method](https://github.com/debezium/debezium/blob/bf347da71ad9b0819998a3bc9754b3cc96cc1563/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java#L63).
 If the connection fails due to something like an invalid login, the Connect 
worker will essentially spam connection attempts frequently and indefinitely 
(until the connector config / database side configs are fixed).
   - An exponential backoff retry mechanism seems more well suited for the 
`DistributedHerder::reconfigureConnectorTasksWithRetry` method. The initial 
retry backoff is retained as `250 ms` with a chosen maximum backoff of `60000 
ms`.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to