[ https://issues.apache.org/jira/browse/KAFKA-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773463#comment-17773463 ]
Sankalp Bhatia edited comment on KAFKA-15565 at 10/9/23 7:34 PM: ----------------------------------------------------------------- Thanks. The reason I say it is a bug is because the overriding you mentioned in [https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L571] takes the min of 1. default api timeout (60s) 2. metadataTimeout: This I believe is derived from the default request timeout if the metadata request is pending.(which is hardcoded to 1hr) 3. default request timeout (hardcoded to 1hr). (But as per contract should be derived from client config) Now consider a case where the client is unable to create a socket connection. Ideally, such a case should be handled in line [584|https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L584], but since the selector enters a long poll of 60s (this can happen when no other selection key is ready), it only gets to know about the timed out connection after 60s, and by that time the Client Call needs to be dropped without any retries. Had the adminClient honored the request timeout, the poll would have been shorter and the request could have been retried. was (Author: sankalpbhatia): Thanks. The reason I say it is a bug is because the overriding you mentioned in [https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L571] takes the min of 1. default api timeout (60s) 2. metadataTimeout: This I believe is derived from the default request timeout if the metadata request is pending.(which is hardcoded to 1hr) 3. default request timeout (hardcoded to 1hr). (But as per contract should be derived from client config) Now consider a case where the client is unable to create a socket connection. Ideally, such a case should be handled in line [584|https://github.com/apache/kafka/blob/bf51a50a564ee43d3515c82fc706f17325c4602f/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L584], but since the selector enters a long poll of 60s, it only gets to know about the timed out connection after 60s, and by that time the Client Call needs to be dropped without any retries. Had the adminClient honored the request timeout, the poll would have been shorter and the request could have been retried. > KafkaAdminClient does not honor request timeout ms > --------------------------------------------------- > > Key: KAFKA-15565 > URL: https://issues.apache.org/jira/browse/KAFKA-15565 > Project: Kafka > Issue Type: Bug > Reporter: Sankalp Bhatia > Assignee: Sankalp Bhatia > Priority: Minor > > It seems to me there is a bug in this line in the KafkaAdminClient. For the > constructor arg defaultRequestTimeoutMs of NetworkClient [1], it uses a > hardcoded value of 1 hour. Ideally, this should be derived from the client > config "request.timeout.ms" from the AdminClientConfig[2]. > References > [1][https://github.com/apache/kafka/blob/1c3eb4395a15cf4f45b6dc0d39effb3dc087f5a4/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L521] > [2][https://github.com/apache/kafka/blob/1c3eb4395a15cf4f45b6dc0d39effb3dc087f5a4/clients/src/main/java/org/apache/kafka/clients/admin/AdminClientConfig.java#L98] -- This message was sent by Atlassian Jira (v8.20.10#820010)