Hi Mike,

Thanks a lot for reviewing the KIP and sharing your feedback.
I agree that such an issue could surface with option 1, but the existing 
PR (that currently implements this option) checks for such duplicate 
assignments and ignores one in favor of the other. So at the end there 
will be valid (non-duplicate) assignments to consumers, but they might 
deviate a bit from the ideal assignment.
If rare scenarios like this are deemed troublesome and we want to avoid 
them, option 2 would probably be the way to go. In that case and in my 
opinion, option 2 would a better solution compared to introducing another 
field (e.g. timestamp).

Regards.
--Vahid



From:   Mike Freyberger <mfreyber...@appnexus.com>
To:     "dev@kafka.apache.org" <dev@kafka.apache.org>
Date:   07/13/2018 08:42 PM
Subject:        Re: [DISCUSS] KIP-341: Update Sticky Assignor's User Data 
Protocol



This is great!

For the client side implementation, I think it’s still possible for there 
to be a duplication. I’ll try to walk through the example here. 

Let’s says there are 2 consumers, 1 topic with 2 partitions. 

After the initial rebalance, generation 0:
Consumer A has partition 0
Consumer B has partition 1

Let’s say consumer B leaves the group (long debug, GC pause). This leads 
to another rebalance. This rebalance will be considered generation 1 and 
will result in:

Generation 1, Consumer A owns partition 0,1

Now let’s say Consumer B is still out of the group and then Consumer A 
leaves as well. While Consumer A is out of the group, Consumer B rejoins 
the group. During this rebalance, the only previous state would be the 
initial generation 0 assignment. So this assignment would be considered 
generation 1 as well and would result in:

Generation 1, Consumer B owns partition 0,1

When A rejoins the group, both consumers would claim ownership of both 
partitions and they would report the assignment was from generation 1. 
This gets us back into the same issue as before because the generation 
number cannot help at all. You could add a timestamp in addition to the 
generation marker, but that’d still be vulnerable to clock skew.

Would hooking into the existing generation marker protect the assignor for 
this kind of situation? We need to make sure the selected implantation is 
protected against the kind of failure mentioned above. 

Also, I have been working on KIP-315, which is another Sticky Assignor, 
which also requires some kind of epoch/generation marker to be protected 
against zombies. So, I’d be in favor of a generic solution here that other 
assignors can leverage. 

Best,

Mike Freyberger

> On Jul 13, 2018, at 6:15 PM, Vahid S Hashemian 
<vahidhashem...@us.ibm.com> wrote:
> 
> Hi all,
> 
> I create a short KIP to address an issue in Sticky Assignor assignment 
> logic: 
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-341%3A+Update+Sticky+Assignor%27s+User+Data+Protocol

> Please take a look and share your feedback / comments.
> 
> In particular, there is a Generation Marker section (
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-341%3A+Update+Sticky+Assignor%27s+User+Data+Protocol#KIP-341:UpdateStickyAssignor'sUserDataProtocol-GenerationMarker

> ) that provides two methods for implementing the improvement to the 
> protocol. I'd like to know which option is more popular.
> 
> Thanks!
> --Vahid
> 
> 





Reply via email to