yashmayya opened a new pull request, #13276: URL: https://github.com/apache/kafka/pull/13276
- Kafka Connect in distributed mode currently retries infinitely with a fixed retry backoff (`250 ms`) in case of errors arising during connector task reconfiguration. - Tasks can be "reconfigured" during connector startup (to get the initial task configs from the connector), a connector resume or if a connector explicitly requests it via its context. - Task reconfiguration essentially entails requesting a connector instance for its task configs and writing them to the Connect cluster's config storage (in case a change in task configs is detected). - A fixed retry backoff of 250 ms leads to very aggressive retries - consider a Debezium connector which attempts to initiate a database connection in its [taskConfigs method](https://github.com/debezium/debezium/blob/bf347da71ad9b0819998a3bc9754b3cc96cc1563/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java#L63). If the connection fails due to something like an invalid login, the Connect worker will essentially spam connection attempts frequently and indefinitely (until the connector config / database side configs are fixed). - An exponential backoff retry mechanism seems more well suited for the `DistributedHerder::reconfigureConnectorTasksWithRetry` method. The initial retry backoff is retained as `250 ms` with a chosen maximum backoff of `60000 ms`. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
