GWphua opened a new issue, #17475: URL: https://github.com/apache/druid/issues/17475
### Affected Version Environments are run on Kubernetes clusters: Production env: 0.22.1 Test env: 27 ### Description Cluster Info: 2 Historicals, max heap 8G, direct memory size 16G. Limit 4cpu, 24G memory. According to the [docs](https://druid.apache.org/docs/latest/configuration/#historical-query-configs), Historical servers can configure a query timeout which stops the queries. The default value is 300_000ms, which translates to 5min. This problem is first encountered when my Historical servers are being overloaded with slow queries, and I want to set a timeout of 90s on Historicals to alleviate this problem. However, setting `druid.server.http.defaultQueryTimeout=90000` does not seem to work. Grafana dashboard reports still record Historical latencies of up to 8min, while Brokers show latencies of 5min, which shows that the default 5min `druid.server.http.defaultQueryTimeout` configuration is valid for the Brokers. (Maybe Historicals need more time to process query cancellations? Read from [Cancel a Query](https://druid.apache.org/docs/latest/api-reference/sql-api#cancel-a-query) that Druid does a "best-effort cancellation", but may need more details on how this 'best-effort' is conducted, and whether query cancellation trigger a stop Historical processes.)  Adding a `timeout` under [query context](https://druid.apache.org/docs/latest/querying/query-context) does help with limiting the latency. Here's a try of adding a timeout in the query context from 500ms to 5s, and back:  To further investigate, I set up a test cluster and modified the `druid.server.http.defaultQueryTimeout` in the `historical/runtime.properties` file to a very low value (5ms) to trigger a timeout condition. However, the query continued to execute as normal (Historical latencies of up to 10s) despite this change. I added the following lines to `historical/runtime.properties`, and rebuilt the Druid image to run on my Kubernetes setup: ```yaml druid.server.http.defaultQueryTimeout=90000 druid.server.http.maxQueryTimeout=90000 ``` Despite these configurations, my Historical servers began experiencing issues loading segments, resulting in an inability to query Druid's provided `trips_xaa` datasource. The error message indicated that the configured timeout had reverted to the default of 5 minutes.  In an effort to troubleshoot, I thoroughly checked the configuration spellings and reviewed ServerConfig.java but found no discrepancies. This raises the possibility that the documentation may misrepresent the ability to configure timeouts for Historical servers, suggesting it may only be applicable to Broker instances. ### Request for Clarification If Druid is indeed lacking the capability to adjust timeout settings for Historical servers or if there has been a change in the configuration naming, I kindly request an update to the documentation for clarity. Thank you for your attention to this matter! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
