[ 
https://issues.apache.org/jira/browse/KAFKA-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans updated KAFKA-15832:
-----------------------------------
    Description: 
Currently the reconciliation logic on the client is triggered when a new target 
assignment is received and resolved, or when new unresolved target assignments 
are discovered in metadata.

This could be improved by triggering the reconciliation logic on each poll 
iteration, to reconcile whatever is ready to be reconciled. This would require 
changes to support poll on the MembershipManager, and integrate it with the 
current polling logic in the background thread. Receiving a new target 
assignment from the broker, or resolving new topic names via a metadata update 
could only ensure that the #assignmentReadyToReconcile are properly updated 
(currently done), but wouldn't trigger the #reconcile() logic, leaving that to 
the #poll() operation.

As a result of this task, we should validate that the client always reconciles 
whatever it has pending that has not been removed by the coordinator. This 
should address edge cases where a client might get stuck JOINING/RECONCILING, 
with a pending reconciliation, where null assignments are exchanged between the 
client and the coordinator, while the long-running reconciliation completes. 
Note that currently, the MembershipManager relies on assignment != null to 
trigger the reconciliation of pending assignments. With the current logic, the 
following sequence would let the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- client added to the group (group epoch bumped to 2), client receives same 
assignment that is currently trying to reconcile (tp1)
- reconciliation completes, will discard the reconciliation result if it 
completes after the fencing, because it will notice that the memberHasRejoined 
(memberEpochOnReconciliationStart != memberEpoch).

  was:
Currently the reconciliation logic on the client is triggered when a new target 
assignment is received and resolved, or when new unresolved target assignments 
are discovered in metadata.

This could be improved by triggering the reconciliation logic on each poll 
iteration, to reconcile whatever is ready to be reconciled. This would required 
changes to support poll on the MembershipManager, and integrate it with the 
current polling logic in the background thread.

As a result of this task, it should be ensured that the client always 
reconciles whatever it has pending that has not been removed by the 
coordinator. (This should address edge cases where a client might get stuck 
JOINING/RECONCILING, with a pending reconciliation, where null assignments are 
exchanged between the client and the coordinator, while the long-running 
reconciliation completes. Note that currently, the MembershipManager relies on 
assignment != null to trigger the reconciliation of pending assignments) 


> Trigger client reconciliation based on manager poll
> ---------------------------------------------------
>
>                 Key: KAFKA-15832
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15832
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: clients, consumer
>            Reporter: Lianet Magrans
>            Assignee: Lianet Magrans
>            Priority: Major
>              Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
>             Fix For: 3.8.0
>
>
> Currently the reconciliation logic on the client is triggered when a new 
> target assignment is received and resolved, or when new unresolved target 
> assignments are discovered in metadata.
> This could be improved by triggering the reconciliation logic on each poll 
> iteration, to reconcile whatever is ready to be reconciled. This would 
> require changes to support poll on the MembershipManager, and integrate it 
> with the current polling logic in the background thread. Receiving a new 
> target assignment from the broker, or resolving new topic names via a 
> metadata update could only ensure that the #assignmentReadyToReconcile are 
> properly updated (currently done), but wouldn't trigger the #reconcile() 
> logic, leaving that to the #poll() operation.
> As a result of this task, we should validate that the client always 
> reconciles whatever it has pending that has not been removed by the 
> coordinator. This should address edge cases where a client might get stuck 
> JOINING/RECONCILING, with a pending reconciliation, where null assignments 
> are exchanged between the client and the coordinator, while the long-running 
> reconciliation completes. Note that currently, the MembershipManager relies 
> on assignment != null to trigger the reconciliation of pending assignments. 
> With the current logic, the following sequence would let the client stuck 
> JOINING:
> - client joins, epoch 0
> - client receives assignment tp1, stuck RECONCILING, epoch 1
> - member gets FENCED on the coord, coord bumps epoch to 2
> - client tries to rejoin (JOINING), epoch 0 provided by the client
> - client added to the group (group epoch bumped to 2), client receives same 
> assignment that is currently trying to reconcile (tp1)
> - reconciliation completes, will discard the reconciliation result if it 
> completes after the fencing, because it will notice that the 
> memberHasRejoined (memberEpochOnReconciliationStart != memberEpoch).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to