Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk opened a new issue, #17594: URL: https://github.com/apache/druid/issues/17594
High IO load on historical nodes during one of nodes upgrade/restart ### Affected Version 31.0.0 ### Description When changing the configuration or upgrading the version we restart `historical` nodes one by one, waiting until the node being restarted becomes available (become registered in Zookeeper). We have replication factor = 2. But it looks like `coordinator` immediatelly assigns load tasks on the remaining `historicals` which causes high IO load on almost all running `historicals` and deep storage (we use `cephfs`). In previous versions of `Druid` we didn't see such behavior, redundancy recovered much more slowly. Is there a way to tell `coordinator` to make a delay with redundancy recovery ? The only `coordinator's` parameter related to such sitiation that I see is `replicationThrottleLimit`, but it does not prevent the redundancy recovery load queue from appearing. There must be another setting to delay recovery completely (do not send load tasks to `historicals`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
