[ 
https://issues.apache.org/jira/browse/KAFKA-17170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans updated KAFKA-17170:
-----------------------------------
    Description: 
When a consumer reconciles an assignment, it transitions to ACKNOWLEDGING, so 
that a HB is sent on the next manager poll, without waiting for the interval. 
The consumer transitions out of this ack state as soon as it sends the 
heartbeat, without waiting for a response. This is based on the expectation 
that following heartbeats (sent on the interval) will act as ack, including the 
set of partitions even in case the first ack is lost. This is the expected flow:
 # complete reconciliation and send HB1 to ack assignment tp0 
 # HB1 times out (or fails in any way) => heartbeat request manager resets the 
sentFields to null (HeartbeatState.reset() , triggered if the request fails, or 
if it gets a response with an Error)
 # following HB will include tp0 (and act as ack), because it will notice that 
tp0 != null (last value sent)

This seems not to be covered by any test, so we should add a unit test to the 
HeartbeatRequestManager, to ensure that the HB generated in step 4 above 
includes tp0 as I expect :), considering both cases of error: request fails (no 
response) and request gets a response with an Error in it. 

This flow is important because if failing to send the reconciled partitions in 
a HB, the broker would remain waiting for an ack that the member would 
considered it already sent (the broker would wait for the rebalance timeout 
before re-assigning those partitions)

  was:
When a consumer reconciles an assignment, it transitions to ACKNOWLEDGING, so 
that a HB is sent on the next manager poll, without waiting for the interval. 
The consumer transitions out of this ack state as soon as it sends the 
heartbeat, without waiting for a response. This is based on the expectation 
that following heartbeats (sent on the interval) will act as ack, including the 
set of partitions even in case the first ack is lost. This is the expected flow:
 # complete reconciliation and send HB1 to ack assignment tp0 
 # HB1 times out (or fails in any way) => heartbeat request manager resets the 
sentFields to null (HeartbeatState.reset() , triggered if the request fails, or 
if it gets a response with an Error)
 # following HB will include tp0 (and act as ack), because it will notice that 
tp0 != null (last value sent)

This seems not to be covered by any test, so we should add a unit test to the 
HeartbeatRequestManager, to ensure that the HB generated in step 4 above 
includes tp0 as I expect :)

This flow is important because if failing to send the reconciled partitions in 
a HB, the broker would remain waiting for an ack that the member would 
considered it already sent (the broker would wait for the rebalance timeout 
before re-assigning those partitions)


> Add test to ensure new consumer acks reconciled assignment even if first HB 
> with ack lost
> -----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17170
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17170
>             Project: Kafka
>          Issue Type: Task
>          Components: clients, consumer
>            Reporter: Lianet Magrans
>            Priority: Minor
>              Labels: kip-848-client-support, newbie
>
> When a consumer reconciles an assignment, it transitions to ACKNOWLEDGING, so 
> that a HB is sent on the next manager poll, without waiting for the interval. 
> The consumer transitions out of this ack state as soon as it sends the 
> heartbeat, without waiting for a response. This is based on the expectation 
> that following heartbeats (sent on the interval) will act as ack, including 
> the set of partitions even in case the first ack is lost. This is the 
> expected flow:
>  # complete reconciliation and send HB1 to ack assignment tp0 
>  # HB1 times out (or fails in any way) => heartbeat request manager resets 
> the sentFields to null (HeartbeatState.reset() , triggered if the request 
> fails, or if it gets a response with an Error)
>  # following HB will include tp0 (and act as ack), because it will notice 
> that tp0 != null (last value sent)
> This seems not to be covered by any test, so we should add a unit test to the 
> HeartbeatRequestManager, to ensure that the HB generated in step 4 above 
> includes tp0 as I expect :), considering both cases of error: request fails 
> (no response) and request gets a response with an Error in it. 
> This flow is important because if failing to send the reconciled partitions 
> in a HB, the broker would remain waiting for an ack that the member would 
> considered it already sent (the broker would wait for the rebalance timeout 
> before re-assigning those partitions)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to