IIRC it wasn't the nodes calculated, but rather the number of collections already in the cluster that caused the issue. See https://issues.apache.org/jira/browse/SOLR-14665
On Thu, Apr 29, 2021 at 1:09 PM Ilan Ginzburg <[email protected]> wrote: > Yes Gus, this was verified, AB did some work around this. > Slowdown is linear on all cardinalities IIRC and absolute values are low. > For example computing placement of 10K replicas in less than 1 sec on 5000 > nodes, less than 3 sec on a 20K nodes cluster, placing 200K replicas on > 5000 nodes, most unfavorable case < 10 sec, most favorable < 1 sec. > > These are older numbers on a specific machine, the latest ones can be > generated by running AffinityPlacementFactoryTest.testScalability(). > > In any case we are multiple orders of magnitude faster than Autoscaling > was. > > Ilan > > On Thu, Apr 29, 2021 at 5:37 PM Gus Heck <[email protected]> wrote: > >> Possibly it was discussed elsewhere or in related tickets and I missed >> it, but has the scaling scenario that caused problems (time to create >> collections increasing linearly with increasing number of collections) been >> tested and compared with the result that lead to deprecation of autoscaling? >> >> On Thu, Apr 29, 2021 at 11:30 AM Ilan Ginzburg <[email protected]> >> wrote: >> >>> Expliciting (I think) your suggestion from the Slack thread Jan: >>> >>> - Add support for a new solr.xml config called something like >>> forceDefaultLegacyPlacementStrategy >>> - Do not add anything in solr.xml >>> >>> At runtime: >>> >>> - If a placement plugin is explicitly configured (existing plugin >>> config in ZK), use it, >>> - If forceDefaultLegacyPlacementStrategy is defined in solr.xml, use >>> LEGACY >>> - If forceDefaultLegacyPlacementStrategy is not defined in solr.xml, >>> use AffinityPlacementFactory >>> >>> I like it! >>> >>> Ilan >>> >>> On Thu, Apr 29, 2021 at 5:23 PM Jan Høydahl <[email protected]> >>> wrote: >>> >>>> Bringing over a discussion from Slack >>>> <https://the-asf.slack.com/archives/CEKUCUNE9/p1619692977151000> >>>> >>>> In 9.0, the old Autoscaling is gone, and instead we have cluster level >>>> "Placement Plugins", see >>>> https://nightlies.apache.org/Solr/Solr-reference-guide-main/replica-placement-plugins.html >>>> >>>> The default behavour on main branch now is "Legacy", described like >>>> this in ref-guide: >>>> >>>> Legacy placement simply assigns new replicas to live nodes in a >>>> round-robin fashion: first it prepares a sorted list of nodes with the >>>> smallest number of existing replicas of the collection. Then for each shard >>>> in the request it adds the replicas to consecutive nodes in this order, >>>> wrapping around to the first node if the number of replicas is larger than >>>> the number of nodes. >>>> This placement strategy doesn’t ensure that no more than 1 replica of a >>>> shard is placed on the same node. Also, the round-robin assignment only >>>> roughly approximates an even spread of replicas across the nodes. >>>> >>>> >>>> From the Slack discussion there seems to be a willingness to default to >>>> one of the brand new placement plugins, the AffinityPlacementFactory, which >>>> is described as >>>> >>>> This plugin implements replica placement algorithm that roughly >>>> replicates this Solr 8.x autoscaling configuration defined here >>>> <https://github.com/lucidworks/fusion-cloud-native/blob/master/policy.json#L16> >>>> : >>>> >>>> >>>> The autoscaling specification in the configuration linked above aimed >>>> to do the following: >>>> >>>> - spread replicas per shard as evenly as possible across multiple >>>> availability zones (given by a system property), >>>> - assign replicas based on replica type to specific kinds of nodes >>>> (another system property), and >>>> - avoid having more than one replica per shard on the same node. >>>> - only after the above constraints are satisfied: >>>> - minimize cores per node, or >>>> - minimize disk usage. >>>> >>>> >>>> So the proposal is to make an instance of AffinityPlacementFactory the >>>> default, with some universally sane defaults for config - either configured >>>> in the default solr.xml or in java code. >>>> >>>> We can make the formal decision in this email thread - by lazy >>>> consensus. >>>> >>>> Jan >>>> >>> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
