[ https://issues.apache.org/jira/browse/KAFKA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838828#comment-16838828 ]
ASF GitHub Bot commented on KAFKA-8341: --------------------------------------- vikasconfluent commented on pull request #6723: KAFKA-8341. Retry Consumer group operation for NOT_COORDINATOR error URL: https://github.com/apache/kafka/pull/6723 An api call for consumer groups is made up of two calls: 1. Find the consumer group coordinator 2. Send the request to the node found in step 1 But the coordinator can get moved between step 1 and 2. In that case we currently fail. This change fixes that by detecting this error and then retrying. Following APIs are impacted by this behavior: 1. listConsumerGroupOffsets 2. deleteConsumerGroups 3. describeConsumerGroups Each of these call result in AdminClient making multiple calls to the backend. As AdminClient code invokes each backend api in a separate event loop, the code that detects the error (step 2) need to restart whole operation including step 1. This needed a change to capture the "Call" object for step 1 in step 2. This change thus refactors the code to make it easy to perform a retry of whole operation. It creates a Context object to capture the api arguments that can then be referred by each "Call" objects. This is just for convenience and makes method signature simpler as we only need to pass one object instead of multiple api arguments. The creation of each "Call" object is done in a new method, so we can easily resubmit step 1 in step 2. This change also modifies corresponding unit test to test this scenario. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AdminClient should retry coordinator lookup after NOT_COORDINATOR error > ----------------------------------------------------------------------- > > Key: KAFKA-8341 > URL: https://issues.apache.org/jira/browse/KAFKA-8341 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Vikas Singh > Priority: Major > > If a group operation (e.g. DescribeGroup) fails because the coordinator has > moved, the AdminClient should lookup the coordinator before retrying the > operation. Currently we will either fail or just retry anyway. This is > similar in some ways to controller rediscovery after getting NOT_CONTROLLER > errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)