Mahmoud-Riad00 opened a new issue, #45997: URL: https://github.com/apache/airflow/issues/45997
### Apache Airflow version 2.10.4 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? When brokers notify the controller about replica placements, there’s a problem during partition reassignments. Specifically, when a broker is no longer a replica, the controller responds with a NOT_LEADER_OR_FOLLOWER error. Unfortunately, before this fix, AssignmentsManager kept retrying these requests even though the broker wasn’t a replica anymore. This unnecessary retrying creates extra system workload and introduces avoidable noise in the logs. The issue can negatively impact performance during large-scale cluster changes or frequent reassignments. Ideally, there should be a mechanism to detect this scenario and stop retrying. ### What you think should happen instead? When a broker is no longer a replica, AssignmentsManager should drop the assignment and not retry failed requests for it. ### How to reproduce - Set up a Kafka cluster and assign replicas to a few brokers. - Reassign partitions and remove a broker as a replica for one or more partitions. - The controller will start responding to replica placement requests with a NOT_LEADER_OR_FOLLOWER error for the removed broker. - Check the logs for AssignmentsManager. You’ll notice it continues retrying requests for the broker that’s no longer a replica. ### Operating System Ubuntu 20.04 LTS. ### Versions of Apache Airflow Providers - apache-airflow-providers-kafka==1.4.0 - apache-airflow-providers-postgres==2.6.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details This is a self-hosted deployment running in a Kubernetes cluster, managed via Helm. We’re using Helm Chart version 8.7.0. Most configurations are fairly standard, with some customizations to the resource limits. ### Anything else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
