The new code has no notion of existing collections so that can’t be an
issue.
It relies on detailed knowledge of the collection for which the Collection
API is working, a related collection when applicable, and per node
metrics/system properties.

Ilan

On Thu 29 Apr 2021 at 19:27, Gus Heck <[email protected]> wrote:

> IIRC it wasn't the nodes calculated, but rather the number of collections
> already in the cluster that caused the issue. See
> https://issues.apache.org/jira/browse/SOLR-14665
>
> On Thu, Apr 29, 2021 at 1:09 PM Ilan Ginzburg <[email protected]> wrote:
>
>> Yes Gus, this was verified, AB did some work around this.
>> Slowdown is linear on all cardinalities IIRC and absolute values are low.
>> For example computing placement of 10K replicas in less than 1 sec on
>> 5000 nodes, less than 3 sec on a 20K nodes cluster, placing 200K replicas
>> on 5000 nodes, most unfavorable case < 10 sec, most favorable < 1 sec.
>>
>> These are older numbers on a specific machine, the latest ones can be
>> generated by running AffinityPlacementFactoryTest.testScalability().
>>
>> In any case we are multiple orders of magnitude faster than Autoscaling
>> was.
>>
>> Ilan
>>
>> On Thu, Apr 29, 2021 at 5:37 PM Gus Heck <[email protected]> wrote:
>>
>>> Possibly it was discussed elsewhere or in related tickets and I missed
>>> it, but has the scaling scenario that caused problems (time to create
>>> collections increasing linearly with increasing number of collections) been
>>> tested and compared with the result that lead to deprecation of autoscaling?
>>>
>>> On Thu, Apr 29, 2021 at 11:30 AM Ilan Ginzburg <[email protected]>
>>> wrote:
>>>
>>>> Expliciting (I think) your suggestion from the Slack thread Jan:
>>>>
>>>>    - Add support for a new solr.xml config called something like
>>>>    forceDefaultLegacyPlacementStrategy
>>>>    - Do not add anything in solr.xml
>>>>
>>>> At runtime:
>>>>
>>>>    - If a placement plugin is explicitly configured (existing plugin
>>>>    config in ZK), use it,
>>>>    - If forceDefaultLegacyPlacementStrategy is defined in solr.xml,
>>>>    use LEGACY
>>>>    - If forceDefaultLegacyPlacementStrategy is not defined in solr.xml,
>>>>    use AffinityPlacementFactory
>>>>
>>>> I like it!
>>>>
>>>> Ilan
>>>>
>>>> On Thu, Apr 29, 2021 at 5:23 PM Jan Høydahl <
>>>> [email protected]> wrote:
>>>>
>>>>> Bringing over a discussion from Slack
>>>>> <https://the-asf.slack.com/archives/CEKUCUNE9/p1619692977151000>
>>>>>
>>>>> In 9.0, the old Autoscaling is gone, and instead we have cluster level
>>>>> "Placement Plugins", see
>>>>> https://nightlies.apache.org/Solr/Solr-reference-guide-main/replica-placement-plugins.html
>>>>>
>>>>> The default behavour on main branch now is "Legacy", described like
>>>>> this in ref-guide:
>>>>>
>>>>> Legacy placement simply assigns new replicas to live nodes in a
>>>>> round-robin fashion: first it prepares a sorted list of nodes with the
>>>>> smallest number of existing replicas of the collection. Then for each 
>>>>> shard
>>>>> in the request it adds the replicas to consecutive nodes in this order,
>>>>> wrapping around to the first node if the number of replicas is larger than
>>>>> the number of nodes.
>>>>> This placement strategy doesn’t ensure that no more than 1 replica of
>>>>> a shard is placed on the same node. Also, the round-robin assignment only
>>>>> roughly approximates an even spread of replicas across the nodes.
>>>>>
>>>>>
>>>>> From the Slack discussion there seems to be a willingness to default
>>>>> to one of the brand new placement plugins, the AffinityPlacementFactory,
>>>>> which is described as
>>>>>
>>>>> This plugin implements replica placement algorithm that roughly
>>>>> replicates this Solr 8.x autoscaling configuration defined here
>>>>> <https://github.com/lucidworks/fusion-cloud-native/blob/master/policy.json#L16>
>>>>> :
>>>>>
>>>>>
>>>>> The autoscaling specification in the configuration linked above aimed
>>>>> to do the following:
>>>>>
>>>>>    - spread replicas per shard as evenly as possible across multiple
>>>>>    availability zones (given by a system property),
>>>>>    - assign replicas based on replica type to specific kinds of nodes
>>>>>    (another system property), and
>>>>>    - avoid having more than one replica per shard on the same node.
>>>>>    - only after the above constraints are satisfied:
>>>>>       - minimize cores per node, or
>>>>>       - minimize disk usage.
>>>>>
>>>>>
>>>>> So the proposal is to make an instance of AffinityPlacementFactory the
>>>>> default, with some universally sane defaults for config - either 
>>>>> configured
>>>>> in the default solr.xml or in java code.
>>>>>
>>>>> We can make the formal decision in this email thread - by lazy
>>>>> consensus.
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Reply via email to