[ 
https://issues.apache.org/jira/browse/KAFKA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870581#comment-17870581
 ] 

David Jacot commented on KAFKA-17116:
-------------------------------------

Thanks for the discussion and sorry for my late reply. I have had a busy week.

I agree with [~chia7712] that it is something worth fixing/improving.

Overall, I think that it is clear for everyone that using temporary id is the 
least desired option. We can probably discard it.

If we want to do the option 1 that I proposed, I think that we need to do a KIP 
for it in order to make it choice clear for all the folks implementing clients. 
I would also require a member id from version 2 of the request, etc, etc. We 
also need to discuss with [~emasab]. For this option, we basically need to 
convince ourselves that generating the member id on the client is safe from a 
collision point of view. There is actually another motivation to this change 
which is ever more important than this bug. At the moment, the first HB is not 
idempotent. If the member is created on the first request but the response is 
lost for some reasons, the client will retry the HB request and a new member 
will be created. If it happens multiple times, multiple "ghosts" members will 
be created and expired when their session timeout will expire. However, they 
will own partitions until they are expired. Generating the member id on the 
client would resolve this bigger issue.

I also wanted to point out that the option 2 does not require any changes to 
the protocol. This option would also solve the issue that I just described.

That being said, I wonder if we should keep this Jira focused on improving the 
state machine of the client. For instance, we could consider not sending the 
leave group if we are still in join group. This would ensure that we only send 
a leave group once the client has joined the group. What do you think?

> New consumer may not send effective leave group if member ID received after 
> close 
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-17116
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17116
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 3.8.0
>            Reporter: Lianet Magrans
>            Assignee: TengYao Chi
>            Priority: Major
>              Labels: kip-848-client-support
>             Fix For: 3.9.0
>
>
> If the new consumer is closed after sending a HB to join, but before 
> receiving the response to it, it will send a leave group request but without 
> member ID (will simply fail with UNKNOWN_MEMBER_ID). This will make that the 
> broker will have a registered new member, for which it will never receive a 
> leave request for it.
>  # consumer.subscribe -> sends HB to join, transitions to JOINING
>  # consumer.close -> will transition to LEAVING and send HB with epoch -1 
> (without waiting for in-flight requests)
>  # consumer receives response to initial HB, containing the assigned member 
> ID. It will simply ignore it because it's not in the group anymore 
> (UNSUBSCRIBED)
> Note that the expectation, with the current logic, and main downsides of this 
> are:
>  # If the case was that the member received partitions on the first HB, those 
> partitions won't be re-assigned (broker waiting for the closed consumer to 
> reconcile them), until the rebalance timeout expires. 
>  # Even if no partitions were assigned to it, the member will remain in the 
> group from the broker point of view (but not from the client POV). The member 
> will be eventually kicked out for not sending HBs, but only when it's session 
> timeout expires.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to