[ 
https://issues.apache.org/jira/browse/KAFKA-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773463#comment-17773463
 ] 

Sankalp Bhatia edited comment on KAFKA-15565 at 10/9/23 7:34 PM:
-----------------------------------------------------------------

Thanks. The reason I say it is a bug is because the overriding you mentioned in 
[https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L571]
 takes the min of

1. default api timeout (60s)

2. metadataTimeout: This I believe is derived from the default request timeout 
if the metadata request is pending.(which is hardcoded to 1hr)

3. default request timeout (hardcoded to 1hr). (But as per contract should be 
derived from client config)

Now consider a case where the client is unable to create a socket connection. 
Ideally, such a case should be handled in line 
[584|https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L584],
 but since the selector enters a long poll of 60s (this can happen when no 
other selection key is ready), it only gets to know about the timed out 
connection after 60s, and by that time the Client Call needs to be dropped 
without any retries. Had the adminClient honored the request timeout, the poll 
would have been shorter and the request could have been retried.


was (Author: sankalpbhatia):
Thanks. The reason I say it is a bug is because the overriding you mentioned in 
[https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L571]
 takes the min of

1. default api timeout (60s)

2. metadataTimeout: This I believe is derived from the default request timeout 
if the metadata request is pending.(which is hardcoded to 1hr)

3. default request timeout (hardcoded to 1hr). (But as per contract should be 
derived from client config)

Now consider a case where the client is unable to create a socket connection. 
Ideally, such a case should be handled in line 
[584|https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L584],
 but since the selector enters a long poll of 60s, it only gets to know about 
the timed out connection after 60s, and by that time the Client Call needs to 
be dropped without any retries. Had the adminClient honored the request 
timeout, the poll would have been shorter and the request could have been 
retried.

> KafkaAdminClient does not honor request timeout ms 
> ---------------------------------------------------
>
>                 Key: KAFKA-15565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sankalp Bhatia
>            Assignee: Sankalp Bhatia
>            Priority: Minor
>
> It seems to me there is a bug in this line in the KafkaAdminClient. For the 
> constructor arg defaultRequestTimeoutMs of NetworkClient [1], it uses a 
> hardcoded value of 1 hour.  Ideally, this should be derived from the client 
> config     "request.timeout.ms"  from the AdminClientConfig[2]. 
> References
> [1][https://github.com/apache/kafka/blob/1c3eb4395a15cf4f45b6dc0d39effb3dc087f5a4/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L521]
> [2][https://github.com/apache/kafka/blob/1c3eb4395a15cf4f45b6dc0d39effb3dc087f5a4/clients/src/main/java/org/apache/kafka/clients/admin/AdminClientConfig.java#L98]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to