Kirk True created KAFKA-16555: --------------------------------- Summary: Consumer's RequestState has incorrect logic to determine if inflight Key: KAFKA-16555 URL: https://issues.apache.org/jira/browse/KAFKA-16555 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0
When running system tests for the new consumer, I've hit an issue where the {{HeartbeatRequestManager}} is sending out multiple concurrent {{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple members which causes downstream assignment problems. Here's the order of events: * Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 202 * Time 236: the response is received and response handler is invoked, setting the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236 * Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees that it's OK to send a request. It creates another request, once again updating the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236 * Time 237: {{HearbeatRequestManager.poll()}} is invoked again, and ERRONEOUSLY decides it's OK to send another request, despite one already in flight. Here's the problem with {{requestInFlight()}}: {code:java} public boolean requestInFlight() { return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs; } {code} On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the received timestamp is _equal_ to the sent timestamp, not _less_. -- This message was sent by Atlassian Jira (v8.20.10#820010)