[ 
https://issues.apache.org/jira/browse/KAFKA-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Mayya updated KAFKA-14353:
-------------------------------
    Description: 
Kafka Connect currently defines a default REST API request timeout of [90 
seconds|https://github.com/apache/kafka/blob/5e399fe6f3aa65b42b9cdbf1c4c53f6989a570f0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectResource.java#L30].
 If a REST API request takes longer than this timeout value, a {{500 Internal 
Server Error}}  response is returned with the message "Request timed out".

The {{POST /connectors}}  and the {{PUT /connectors/\{connector}/config}}  
endpoints that are used to create or update connectors internally do a 
connector configuration validation (the details of which vary depending on the 
connector plugin) before proceeding to write a message to the Connect cluster's 
config topic. If the configuration validation takes longer than 90 seconds, the 
connector is still eventually created after the config validation completes 
(even though a {{500 Internal Server Error}}  response is returned to the user) 
which leads to a fairly confusing user experience.

Furthermore, this situation is exacerbated by the potential for config 
validations occurring twice for a single request. If Kafka Connect is running 
in distributed mode, requests to create or update a connector are forwarded to 
the Connect worker which is currently the leader of the group, if the initial 
request is made to a worker which is not the leader. In this case, the config 
validation occurs both on the initial worker, as well as the leader (assuming 
that the first config validation is successful) - this means that if a config 
validation takes longer than 45 seconds to complete each time, it will result 
in the original create / update connector request timing out.

Slow config validations can occur in certain exceptional scenarios - consider a 
database connector which has elaborate validation logic involving querying 
information schema to get a list of tables and views to validate the user's 
connector configuration. If the database has a very high number of tables and 
views and the database is under a heavy load in terms of query volume, such 
information schema queries can end up being considerably slow to complete.

  was:Kafka Connect currently defines a default REST API request timeout of [90 
seconds|https://github.com/apache/kafka/blob/5e399fe6f3aa65b42b9cdbf1c4c53f6989a570f0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectResource.java#L30]
 which isn't configurable. If a REST API request takes longer than this, a 
{{500 Internal Server Error}}  response is returned with the message "Request 
timed out". In exceptional scenarios, a longer timeout may be required for 
operations such as connector config validation / connector creation (which 
internally does a config validation first). We should allow the request timeout 
to be configurable via a Kafka Connect worker property.


> Kafka Connect REST API configuration validation timeout improvements
> --------------------------------------------------------------------
>
>                 Key: KAFKA-14353
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14353
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Yash Mayya
>            Assignee: Yash Mayya
>            Priority: Major
>              Labels: kip-required
>
> Kafka Connect currently defines a default REST API request timeout of [90 
> seconds|https://github.com/apache/kafka/blob/5e399fe6f3aa65b42b9cdbf1c4c53f6989a570f0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectResource.java#L30].
>  If a REST API request takes longer than this timeout value, a {{500 Internal 
> Server Error}}  response is returned with the message "Request timed out".
> The {{POST /connectors}}  and the {{PUT /connectors/\{connector}/config}}  
> endpoints that are used to create or update connectors internally do a 
> connector configuration validation (the details of which vary depending on 
> the connector plugin) before proceeding to write a message to the Connect 
> cluster's config topic. If the configuration validation takes longer than 90 
> seconds, the connector is still eventually created after the config 
> validation completes (even though a {{500 Internal Server Error}}  response 
> is returned to the user) which leads to a fairly confusing user experience.
> Furthermore, this situation is exacerbated by the potential for config 
> validations occurring twice for a single request. If Kafka Connect is running 
> in distributed mode, requests to create or update a connector are forwarded 
> to the Connect worker which is currently the leader of the group, if the 
> initial request is made to a worker which is not the leader. In this case, 
> the config validation occurs both on the initial worker, as well as the 
> leader (assuming that the first config validation is successful) - this means 
> that if a config validation takes longer than 45 seconds to complete each 
> time, it will result in the original create / update connector request timing 
> out.
> Slow config validations can occur in certain exceptional scenarios - consider 
> a database connector which has elaborate validation logic involving querying 
> information schema to get a list of tables and views to validate the user's 
> connector configuration. If the database has a very high number of tables and 
> views and the database is under a heavy load in terms of query volume, such 
> information schema queries can end up being considerably slow to complete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to