gharris1727 opened a new pull request #9765: URL: https://github.com/apache/kafka/pull/9765
When two cooperative rebalances take place soon after one another, a prior rebalance may not complete before the next rebalance is started. Under Eager rebalancing, no tasks would have been started, so the subsequent onRevoked call is intentionally skipped whenever rebalanceResolved was false. Under Cooperative rebalancing, the same logic causes the DistributedHerder to skip stopping all of the connector/task revocations which occur in the second rebalance. The DistributedHerder still removes the revoked connectors/tasks from its assignment, so that the DistributedHerder and Worker have different knowledge of running connectors/tasks. This causes the connector/task instances that would have been stopped to disappear from the rebalance protocol, and left running until their workers are halted, or they fail. Connectors/Tasks which were then reassigned to other workers by the rebalance protocol would be duplicated, and run concurrently with zombie connectors/tasks. Connectors/Tasks which were reassigned back to the same worker would encounter exceptions in Worker, indicating that the connector/task existed and was already running. * Add a test for revoking and then reassigning a connector under normal circumstances * Add a test for revoking and then reassigning a connector following an incomplete cooperative rebalance * Change expectRebalance to make assignment fields mutable before passing them into the DistributedHerder * Only skip revocation for the Eager protocol, and never skip revocation for cooperative/sessioned protocols Signed-off-by: Greg Harris <gr...@confluent.io> ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org