The new code has no notion of existing collections so that can’t be an issue. It relies on detailed knowledge of the collection for which the Collection API is working, a related collection when applicable, and per node metrics/system properties.
Ilan On Thu 29 Apr 2021 at 19:27, Gus Heck <[email protected]> wrote: > IIRC it wasn't the nodes calculated, but rather the number of collections > already in the cluster that caused the issue. See > https://issues.apache.org/jira/browse/SOLR-14665 > > On Thu, Apr 29, 2021 at 1:09 PM Ilan Ginzburg <[email protected]> wrote: > >> Yes Gus, this was verified, AB did some work around this. >> Slowdown is linear on all cardinalities IIRC and absolute values are low. >> For example computing placement of 10K replicas in less than 1 sec on >> 5000 nodes, less than 3 sec on a 20K nodes cluster, placing 200K replicas >> on 5000 nodes, most unfavorable case < 10 sec, most favorable < 1 sec. >> >> These are older numbers on a specific machine, the latest ones can be >> generated by running AffinityPlacementFactoryTest.testScalability(). >> >> In any case we are multiple orders of magnitude faster than Autoscaling >> was. >> >> Ilan >> >> On Thu, Apr 29, 2021 at 5:37 PM Gus Heck <[email protected]> wrote: >> >>> Possibly it was discussed elsewhere or in related tickets and I missed >>> it, but has the scaling scenario that caused problems (time to create >>> collections increasing linearly with increasing number of collections) been >>> tested and compared with the result that lead to deprecation of autoscaling? >>> >>> On Thu, Apr 29, 2021 at 11:30 AM Ilan Ginzburg <[email protected]> >>> wrote: >>> >>>> Expliciting (I think) your suggestion from the Slack thread Jan: >>>> >>>> - Add support for a new solr.xml config called something like >>>> forceDefaultLegacyPlacementStrategy >>>> - Do not add anything in solr.xml >>>> >>>> At runtime: >>>> >>>> - If a placement plugin is explicitly configured (existing plugin >>>> config in ZK), use it, >>>> - If forceDefaultLegacyPlacementStrategy is defined in solr.xml, >>>> use LEGACY >>>> - If forceDefaultLegacyPlacementStrategy is not defined in solr.xml, >>>> use AffinityPlacementFactory >>>> >>>> I like it! >>>> >>>> Ilan >>>> >>>> On Thu, Apr 29, 2021 at 5:23 PM Jan Høydahl < >>>> [email protected]> wrote: >>>> >>>>> Bringing over a discussion from Slack >>>>> <https://the-asf.slack.com/archives/CEKUCUNE9/p1619692977151000> >>>>> >>>>> In 9.0, the old Autoscaling is gone, and instead we have cluster level >>>>> "Placement Plugins", see >>>>> https://nightlies.apache.org/Solr/Solr-reference-guide-main/replica-placement-plugins.html >>>>> >>>>> The default behavour on main branch now is "Legacy", described like >>>>> this in ref-guide: >>>>> >>>>> Legacy placement simply assigns new replicas to live nodes in a >>>>> round-robin fashion: first it prepares a sorted list of nodes with the >>>>> smallest number of existing replicas of the collection. Then for each >>>>> shard >>>>> in the request it adds the replicas to consecutive nodes in this order, >>>>> wrapping around to the first node if the number of replicas is larger than >>>>> the number of nodes. >>>>> This placement strategy doesn’t ensure that no more than 1 replica of >>>>> a shard is placed on the same node. Also, the round-robin assignment only >>>>> roughly approximates an even spread of replicas across the nodes. >>>>> >>>>> >>>>> From the Slack discussion there seems to be a willingness to default >>>>> to one of the brand new placement plugins, the AffinityPlacementFactory, >>>>> which is described as >>>>> >>>>> This plugin implements replica placement algorithm that roughly >>>>> replicates this Solr 8.x autoscaling configuration defined here >>>>> <https://github.com/lucidworks/fusion-cloud-native/blob/master/policy.json#L16> >>>>> : >>>>> >>>>> >>>>> The autoscaling specification in the configuration linked above aimed >>>>> to do the following: >>>>> >>>>> - spread replicas per shard as evenly as possible across multiple >>>>> availability zones (given by a system property), >>>>> - assign replicas based on replica type to specific kinds of nodes >>>>> (another system property), and >>>>> - avoid having more than one replica per shard on the same node. >>>>> - only after the above constraints are satisfied: >>>>> - minimize cores per node, or >>>>> - minimize disk usage. >>>>> >>>>> >>>>> So the proposal is to make an instance of AffinityPlacementFactory the >>>>> default, with some universally sane defaults for config - either >>>>> configured >>>>> in the default solr.xml or in java code. >>>>> >>>>> We can make the formal decision in this email thread - by lazy >>>>> consensus. >>>>> >>>>> Jan >>>>> >>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >
