Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk opened a new issue, #17594:
URL: https://github.com/apache/druid/issues/17594

   High IO load on historical nodes during one of nodes upgrade/restart
   
   ### Affected Version
   
   31.0.0
   
   ### Description
   
   When changing the configuration or upgrading the version we restart 
`historical` nodes one by one, waiting until the node being restarted becomes 
available (become registered in Zookeeper). We have replication factor = 2. But 
it looks like `coordinator` immediatelly assigns load tasks on the remaining 
`historicals` which causes high IO load on almost all running `historicals` and 
deep storage (we use `cephfs`). In previous versions of `Druid` we didn't see 
such behavior, redundancy recovered much more slowly.
   
   Is there a way to tell `coordinator` to make a delay with redundancy 
recovery ?
   
   The only `coordinator's` parameter related to such sitiation that I see is 
`replicationThrottleLimit`, but it does not prevent the redundancy recovery 
load queue from appearing. There must be another setting to delay recovery 
completely (do not send load tasks to `historicals`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to