Manikumar created KAFKA-19561:
---------------------------------
Summary: Request Timeout During SASL Reauthentication Due to
Missed OP_WRITE interest set
Key: KAFKA-19561
URL: https://issues.apache.org/jira/browse/KAFKA-19561
Project: Kafka
Issue Type: Bug
Reporter: Manikumar
Assignee: Manikumar
We've observed request timeouts occurring during SASL reauthentication, and
analysis suggests the issue is caused by a race condition between request
handling and reauthentication on the broker side. Here’s the sequence:
# Client sends a request ({{{}Req1{}}}) to the broker.
# Client begins SASL reauthentication.
# Broker receives {{{}Req1{}}}.
# Broker also initiates SASL reauthentication.
# While reauth is in progress:
** Broker processes {{{}Req1{}}}, prepares {{{}Res1{}}}, and queues it via
{{{}KafkaChannel.send(){}}}.
** Broker sets {{SelectionKey.OP_WRITE}} to indicate write readiness.
** However, {{Selector.attemptWrite()}} skips the send because:
*** {{channel.hasSend()}} is true, but
*** {{channel.ready()}} is false (since reauth is not yet complete).
# After reauth completes, broker removes {{OP_WRITE}} from the selection key.
# At this point:
** {{Res1}} is still pending in the channel.
** {{channel.hasSend()}} and {{channel.ready()}} are now true,
** But {{key.isWritable()}} is false, so no further write is attempted.
8. The response remains stuck in the send buffer. Client eventually hits
a request timeout.
The fix is to set write readiness using SelectionKey.OP_WRITE at the end of
Step 6. This is similar to [what we do on client
side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)