This is great!

For the client side implementation, I think it’s still possible for there to be 
a duplication. I’ll try to walk through the example here. 

Let’s says there are 2 consumers, 1 topic with 2 partitions. 

After the initial rebalance, generation 0:
Consumer A has partition 0
Consumer B has partition 1

Let’s say consumer B leaves the group (long debug, GC pause). This leads to 
another rebalance. This rebalance will be considered generation 1 and will 
result in:

Generation 1, Consumer A owns partition 0,1

Now let’s say Consumer B is still out of the group and then Consumer A leaves 
as well. While Consumer A is out of the group, Consumer B rejoins the group. 
During this rebalance, the only previous state would be the initial generation 
0 assignment. So this assignment would be considered generation 1 as well and 
would result in:

Generation 1, Consumer B owns partition 0,1

When A rejoins the group, both consumers would claim ownership of both 
partitions and they would report the assignment was from generation 1. This 
gets us back into the same issue as before because the generation number cannot 
help at all. You could add a timestamp in addition to the generation marker, 
but that’d still be vulnerable to clock skew.

Would hooking into the existing generation marker protect the assignor for this 
kind of situation? We need to make sure the selected implantation is protected 
against the kind of failure mentioned above. 

Also, I have been working on KIP-315, which is another Sticky Assignor, which 
also requires some kind of epoch/generation marker to be protected against 
zombies. So, I’d be in favor of a generic solution here that other assignors 
can leverage. 

Best,

Mike Freyberger

> On Jul 13, 2018, at 6:15 PM, Vahid S Hashemian <vahidhashem...@us.ibm.com> 
> wrote:
> 
> Hi all,
> 
> I create a short KIP to address an issue in Sticky Assignor assignment 
> logic: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-341%3A+Update+Sticky+Assignor%27s+User+Data+Protocol
> Please take a look and share your feedback / comments.
> 
> In particular, there is a Generation Marker section (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-341%3A+Update+Sticky+Assignor%27s+User+Data+Protocol#KIP-341:UpdateStickyAssignor'sUserDataProtocol-GenerationMarker
> ) that provides two methods for implementing the improvement to the 
> protocol. I'd like to know which option is more popular.
> 
> Thanks!
> --Vahid
> 
> 

Reply via email to