[ 
https://issues.apache.org/jira/browse/KAFKA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar updated KAFKA-19561:
------------------------------
    Description: 
We've observed request timeouts occurring during SASL reauthentication, and 
analysis suggests the issue is caused by a race condition between request 
handling and reauthentication on the broker side. Here’s the sequence:
 # Client sends a request (Req1) to the broker.
 # Client initiates SASL reauthentication.
 # Broker receives Req1.
 # Broker also begins SASL reauthentication.
 # While reauth is in progress:
 ** Broker completes processing of Req1 and prepares a response (Res1).
 ** Res1 is queued via KafkaChannel.send().
 ** Broker sets SelectionKey.OP_WRITE to indicate write readiness.
 ** However, Selector.attemptWrite() does not proceed because:
 *** 
 **** channel.hasSend() is true, but
 **** channel.ready() is false (reauth is still in progress).
 # Once reauthentication completes: Broker removes SelectionKey.OP_WRITE.
 # At this point:
 ** channel.hasSend() and channel.ready() are now true,
 ** But key.isWritable() is false, so the response (Res1) is never sent.
 # The response remains stuck in the send buffer. Client eventually hits a 
request timeout.

The fix is to set write readiness using SelectionKey.OP_WRITE at the end of 
Step 6. This is similar to [what we do on client 
side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].

  was:
We've observed request timeouts occurring during SASL reauthentication, and 
analysis suggests the issue is caused by a race condition between request 
handling and reauthentication on the broker side. Here’s the sequence:


 # Client sends a request ({{{}Req1{}}}) to the broker.

 # Client begins SASL reauthentication.

 # Broker receives {{{}Req1{}}}.

 # Broker also initiates SASL reauthentication.

 # While reauth is in progress:

 ** Broker processes {{{}Req1{}}}, prepares {{{}Res1{}}}, and queues it via 
{{{}KafkaChannel.send(){}}}.

 ** Broker sets {{SelectionKey.OP_WRITE}} to indicate write readiness.

 ** However, {{Selector.attemptWrite()}} skips the send because:

 *** {{channel.hasSend()}} is true, but

 *** {{channel.ready()}} is false (since reauth is not yet complete).

 # After reauth completes, broker removes {{OP_WRITE}} from the selection key.

 # At this point:

 ** {{Res1}} is still pending in the channel.

 ** {{channel.hasSend()}} and {{channel.ready()}} are now true,

 ** But {{key.isWritable()}} is false, so no further write is attempted.

       8. The response remains stuck in the send buffer. Client eventually hits 
a request timeout.


The fix is to set write readiness using SelectionKey.OP_WRITE at the end of 
Step 6. This is similar to [what we do on client 
side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].


> Request Timeout During SASL Reauthentication Due to Missed OP_WRITE  interest 
> set 
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-19561
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19561
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Manikumar
>            Assignee: Manikumar
>            Priority: Major
>
> We've observed request timeouts occurring during SASL reauthentication, and 
> analysis suggests the issue is caused by a race condition between request 
> handling and reauthentication on the broker side. Here’s the sequence:
>  # Client sends a request (Req1) to the broker.
>  # Client initiates SASL reauthentication.
>  # Broker receives Req1.
>  # Broker also begins SASL reauthentication.
>  # While reauth is in progress:
>  ** Broker completes processing of Req1 and prepares a response (Res1).
>  ** Res1 is queued via KafkaChannel.send().
>  ** Broker sets SelectionKey.OP_WRITE to indicate write readiness.
>  ** However, Selector.attemptWrite() does not proceed because:
>  *** 
>  **** channel.hasSend() is true, but
>  **** channel.ready() is false (reauth is still in progress).
>  # Once reauthentication completes: Broker removes SelectionKey.OP_WRITE.
>  # At this point:
>  ** channel.hasSend() and channel.ready() are now true,
>  ** But key.isWritable() is false, so the response (Res1) is never sent.
>  # The response remains stuck in the send buffer. Client eventually hits a 
> request timeout.
> The fix is to set write readiness using SelectionKey.OP_WRITE at the end of 
> Step 6. This is similar to [what we do on client 
> side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to