ableegoldman opened a new pull request #10985:
URL: https://github.com/apache/kafka/pull/10985


   The primary goal of this PR is to address the problem we've seen in the wild 
in which the ConsumerCoordinator fails to update its SubscriptionState and 
ultimately feeds invalid `ownedPartitions` data as input to the assignor. 
Previously the assignor would detect that something was wrong and just throw an 
exception, now we make several efforts to detect this earlier in the assignment 
process and then fix it if possible, and work around it if not.
   
   Specifically, this PR does a few things:
   1) Bring the `generation` field back to the CooperativeStickyAssignor so we 
don't need to rely so heavily on the ConsumerCoordinator properly updating its 
SubscriptionState after eg falling out of the group. The plain StickyAssignor 
always used the generation since it had to, so we just make sure the 
CooperativeStickyAssignor has this tool as well
   2) In case of unforeseen problems or further bugs that slip past the 
`generation` field safety net, the assignor will now explicitly look out for 
partitions that are being claimed by multiple consumers as owned in the same 
generation. Such a case should never occur, but if it does, we have to 
invalidate this partition from the `ownedPartitions` of both consumers, since 
we can't tell who, if anyone, has the valid claim to this partition.
   3) Fix a subtle bug that I discovered while writing tests for the above two 
fixes: in the constrained algorithm, we compute the exact number of partitions 
each consumer should end up with, and keep track of the "unfilled" members who 
must -- or _might_ -- require more partitions to hit their quota. The problem 
was that members at the `minQuota` were being considered as "unfilled" even 
after we had already hit the maximum number of consumers allowed to go up to 
the `maxQuota`, meaning those `minQuota` members could/should not accept any 
more partitions beyond that. I believe this was introduced in 
[#10509](https://github.com/apache/kafka/pull/10509), so it shouldn't be in any 
released versions and does not need to be backported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to