radu-gheorghe commented on PR #1650: URL: https://github.com/apache/solr/pull/1650#issuecomment-1587759700
Awesome stuff, @HoustonPutman! Unfortunately, I think it's too much for me to really understand everything your code does, so I'll add some higher-level comments here instead. I hope they're not too dumb, but if you are, ignore them, if they're useful, good 😄 And of course let me know if you have any additional comments/questions. * `OrderedNodePlacementPlugin` changes into its own PR - I don't think so, I think the changes are too tightly integrated and it will make things even harder to grok. It looks like that's the direction this PR is going towards anyway 🙂 * I saw a "parallel" flag, but I think replicas are moved one at a time, correct? If so, it will likely prove to be a bottleneck down the road. Some ideas to fix it (maybe in a separate PR): ** Allow rebalancing to happen in parallel for movements where the source AND the destination node are not on the list of source/destination nodes that are already participating in rebalancing. I would say this is pretty safe to do, because if one allows/triggers rebalancing, one would assume there's this extra load that can happen on any node ** Go for the Elasticsearch route and limit the number of concurrent rebalancing AND the number of concurrent incoming/outgoing rebalances per node. The problem is, IMO, with network throughput used by replication. AFAIK, in Solr you can limit it per core instead of the whole node, one has to keep that in mind to make sure rebalancing doesn't choke the network * Am I reading the code right that we compute all the shard movements that have to be done, then do them all, then check if we were successful? That seems a little bit risky because, if rebalancing takes a long time, changes can happen in the cluster (e.g. node added/failed, new collections added/removed...) so I guess one would have to cancel the rebalancing call and try again. I think the current approach has some advantages, but I have a feeling that if a call would just move one shard at a time (or N in parallel), then come back to see what's the next best thing, etc. it would be safer (and play nicer with the Operator loop, too). I'm thinking again of Elasticsearch (because all I have is a hammer 😅 ) and how it doesn't rebalance more stuff until all recoveries are done. Meaning that if a node went down, something/someone (kubernetes or human operator) could decide to stop the rebalance, replace the node, then resume. I hope I'm not way off here 🙈 * I saw a `maxBalanceSkew` property, but I'm not really sure what it does. Is this the number of "weight points" difference between nodes under which we say "don't bother?" And that would be different based on the placement plugin? * The balancing code that I saw for the Simple placement plugin seems more complicated than the `minimizeCores` one. In Affinity, we don't seem to really deal with the `withCollection` case, in the sense that we would probably want to move shards of both collections at the same time - we check if the destination node has that collection, correct? Anyway, this is not meant to be criticism, just to "sell" the crazy idea of having existing placement plugins **not** implement rebalancing and just have new ones, like: ** the Simple plugin as you've just implemented it (with a different name) ** the minimizeCores plugins as it is? Or maybe this is simple enough to simply remain as you extended it now? ** with the Affinity plugin, starting from scratch would avoid dealing (at least initially) with edge cases like `withCollection` or maybe even replica types. * Do I get it right that we soft-enforce constraints like AZ-awareness by adding big weights? If so, would it be better down the road to add them as constraints? Maybe in `canAddReplica`? This way we only make balancing moves that are "legal", e.g. not to have two copies of the same shard in the same AZ or on the same node, or if the node has too little space, etc. Maybe down the road we can add other constraints. One that I find really useful is the total number of shards in a collection in a node: if the cluster shrinks below a size, I prefer not to choke existing nodes with more replicas and create a domino effect. This last bit on "total shards per collection per node" isn't really about rebalancing, but about placing a new - e.g. recovered - replica. But now that we're in the realm of placement plugins, we might as well use weights for "placing" replicas in other contexts. And I think your PR already does that, no? If I simply create a new replica, the placement plugin will put it on the node with the lowest weight, correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org