Kirk True created KAFKA-16555:
---------------------------------

             Summary: Consumer's RequestState has incorrect logic to determine 
if inflight
                 Key: KAFKA-16555
                 URL: https://issues.apache.org/jira/browse/KAFKA-16555
             Project: Kafka
          Issue Type: Task
          Components: clients, consumer
    Affects Versions: 3.7.0
            Reporter: Kirk True
            Assignee: Kirk True
             Fix For: 3.8.0


When running system tests for the new consumer, I've hit an issue where the 
{{HeartbeatRequestManager}} is sending out multiple concurrent 
{{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple 
members which causes downstream assignment problems.

Here's the order of events:

* Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a 
request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the 
current timestamp, 202
* Time 236: the response is received and response handler is invoked, setting 
the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236
* Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees 
that it's OK to send a request. It creates another request, once again updating 
the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236
* Time 237:  {{HearbeatRequestManager.poll()}} is invoked again, and 
ERRONEOUSLY decides it's OK to send another request, despite one already in 
flight.

Here's the problem with {{requestInFlight()}}:

{code:java}
public boolean requestInFlight() {
    return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs;
}
{code}

On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the 
received timestamp is _equal_ to the sent timestamp, not _less_.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to