[ 
https://issues.apache.org/jira/browse/KAFKA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846904#comment-16846904
 ] 

ASF GitHub Bot commented on KAFKA-8341:
---------------------------------------

soondenana commented on pull request #6723: KAFKA-8341. Retry Consumer group 
operation for NOT_COORDINATOR error
URL: https://github.com/apache/kafka/pull/6723
 
 
   An api call for consumer groups is made up of two calls:
   1. Find the consumer group coordinator
   2. Send the request to the node found in step 1
   
   But the coordinator can get moved between step 1 and 2. In that case we
   currently fail. This change fixes that by detecting this error and then
   retrying.
   
   Following APIs are impacted by this behavior:
   1. listConsumerGroupOffsets
   2. deleteConsumerGroups
   3. describeConsumerGroups
   
   Each of these call result in AdminClient making multiple calls to the 
backend.
   As AdminClient code invokes each backend api in a separate event loop, the 
code
   that detects the error (step 2) need to restart whole operation including
   step 1. This needed a change to capture the "Call" object for step 1 in
   step 2.
   
   This change thus refactors the code to make it easy to perform a retry of
   whole operation. It creates a Context object to capture the api arguments
   that can then be referred by each "Call" objects. This is just for 
convenience
   and makes method signature simpler as we only need to pass one object instead
   of multiple api arguments.
   
   The creation of each "Call" object is done in a new method, so we can
   easily resubmit step 1 in step 2.
   
   This change also modifies corresponding unit test to test this scenario.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AdminClient should retry coordinator lookup after NOT_COORDINATOR error
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-8341
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8341
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Vikas Singh
>            Priority: Major
>
> If a group operation (e.g. DescribeGroup) fails because the coordinator has 
> moved, the AdminClient should lookup the coordinator before retrying the 
> operation. Currently we will either fail or just retry anyway. This is 
> similar in some ways to controller rediscovery after getting NOT_CONTROLLER 
> errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to