elloooooo commented on issue #6503: Add rebalance strategy for RandomBalancerStrategy URL: https://github.com/apache/incubator-druid/pull/6503#issuecomment-435736786 @clintropolis Thank you for your patient explaination. > Hmm, I think at least `CachedCostBalancerStrategy` should be able to handle that many segments and many more, however it I know it is possible to make this not true via coordinator configuration. I would be interested in how your coordinator is tuned in the case where it is causing stuck handoff, like what are settings for max segments to move, etc? Additional cluster details such as number of historical nodes, type of deep storage, type of segment loading, etc, would be interesting as well if you're willing to share. I will try more harder to tune the `CachedCostBalancerStrategy` later as you mentioned its performance should be better. > That said, if for some reason `CachingCostBalancerStrategy` doesn't fit your use case, I don't think these changes are quite appropriate to add to `RandomBalancerStrategy`. I believe these changes make it behave much less randomly as it would now balance on storage capacity usage, as far as I understand this PR. Rather, I think it would be more sensible to add a new balancer strategy that implements this functionality instead, maybe call it `DiskThresholdBalancerStrategy`. However, I think on the query side balancing like this is not going to be a great experience in terms of query speed, since nothing will prevent or actively try to avoid load being concentrated in hot spots. I think every `XXXBalancerStrategy` concludes two roles. One is to distribute new segments and the other is to rebalance the existed segments. The basic reason I add this changes here is the principle to pick server or segment is totally random, I think. The concern for the disk used percentage is just for rebalance the segments, but not change the principle of randomly pick.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
