[ 
https://issues.apache.org/jira/browse/KAFKA-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077505#comment-17077505
 ] 

ASF GitHub Bot commented on KAFKA-9801:
---------------------------------------

guozhangwang commented on pull request #8439: KAFKA-9801: Still trigger 
rebalance when static member joins in CompletingRebalance phase
URL: https://github.com/apache/kafka/pull/8439
 
 
   This is a cherry-pick PR from #8405 to trunk (due to the large divergence we 
cannot do that vie git cherry-pick).
   
   * Fix the direct cause of the observed issue on the client side: when 
heartbeat getting errors and resetting generation, we only need to set it to 
UNJOINED when it was not already in REBALANCING; otherwise, the join-group 
handler would throw the retriable UnjoinedGroupException to force the consumer 
to re-send join group unnecessarily.
   
   * Fix the root cause of the issue on the broker side: we should still 
trigger rebalance when static member joins in CompletingRebalance phase; 
otherwise the member.ids would be changed when the assignment is received from 
the leader, hence causing the new member.id's assignment to be empty.
   
   * Added log4j entries as a by-product of my investigation.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Static member could get empty assignment unexpectedly
> -----------------------------------------------------
>
>                 Key: KAFKA-9801
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9801
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, streams
>    Affects Versions: 2.4.0
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Critical
>             Fix For: 2.5.0
>
>
> Take the following example trace where static members are joining the group:
> 1. Static member with instance A joined the group with empty member, the 
> coordinator generated member.id 1 for A and added it to the group. The group 
> state is PreparingRebalance.
> 2. The group is formed and now we move on to CompletingRebalance.
> 3. Another member joins the group, causing it to transit back to 
> PreparingRebalance, which would potentially send a REBALANCE_IN_PROGRESS to 
> member A as well.
> 4. Member A gets the REBALANCE_IN_PROGRESS error, trying to re-join (again 
> with an empty member.id)
> 5. The group is now advanced to CompletingRebalance again.
> 6. The group get the second join-group from the known instance A with an 
> empty member.id, will generated a new member.id 2 and replace the member.id 1.
> 7. The group gets the assignment from leader which only includes member.id 1 
> and not member.id 2.
> 8. The assignment for member.id 1 is dropped on the broker side while the 
> assignment for member.id 2 is set to an empty byte array.
> 9. The empty byte array is sent back to the instance A causing it the 
> following error:
> {code}
> [2020-03-27T21:13:01-05:00] 
> (streams-soak-2-5_soak_i-054b83e98b7ed6285_streamslog) 
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'version': java.nio.BufferUnderflowException
>       at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110)
> {code}
> This error has to be triggered when quite a few cases are aligned together, 
> and hence it was not triggered very frequently.
> Personally I think there's a correlation with this error to the observed 
> https://issues.apache.org/jira/browse/KAFKA-9659 as well, which I'd keep 
> investigating (will update in this ticket).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to