Re: Proposal to use AffinityPlacementFactory as default in 9.0

Ilan Ginzburg Thu, 29 Apr 2021 12:11:11 -0700

The new code has no notion of existing collections so that can’t be an
issue.
It relies on detailed knowledge of the collection for which the Collection
API is working, a related collection when applicable, and per node
metrics/system properties.


Ilan

On Thu 29 Apr 2021 at 19:27, Gus Heck <[email protected]> wrote:

> IIRC it wasn't the nodes calculated, but rather the number of collections
> already in the cluster that caused the issue. See
> https://issues.apache.org/jira/browse/SOLR-14665
>
> On Thu, Apr 29, 2021 at 1:09 PM Ilan Ginzburg <[email protected]> wrote:
>
>> Yes Gus, this was verified, AB did some work around this.
>> Slowdown is linear on all cardinalities IIRC and absolute values are low.
>> For example computing placement of 10K replicas in less than 1 sec on
>> 5000 nodes, less than 3 sec on a 20K nodes cluster, placing 200K replicas
>> on 5000 nodes, most unfavorable case < 10 sec, most favorable < 1 sec.
>>
>> These are older numbers on a specific machine, the latest ones can be
>> generated by running AffinityPlacementFactoryTest.testScalability().
>>
>> In any case we are multiple orders of magnitude faster than Autoscaling
>> was.
>>
>> Ilan
>>
>> On Thu, Apr 29, 2021 at 5:37 PM Gus Heck <[email protected]> wrote:
>>
>>> Possibly it was discussed elsewhere or in related tickets and I missed
>>> it, but has the scaling scenario that caused problems (time to create
>>> collections increasing linearly with increasing number of collections) been
>>> tested and compared with the result that lead to deprecation of autoscaling?
>>>
>>> On Thu, Apr 29, 2021 at 11:30 AM Ilan Ginzburg <[email protected]>
>>> wrote:
>>>
>>>> Expliciting (I think) your suggestion from the Slack thread Jan:
>>>>
>>>>    - Add support for a new solr.xml config called something like
>>>>    forceDefaultLegacyPlacementStrategy
>>>>    - Do not add anything in solr.xml
>>>>
>>>> At runtime:
>>>>
>>>>    - If a placement plugin is explicitly configured (existing plugin
>>>>    config in ZK), use it,
>>>>    - If forceDefaultLegacyPlacementStrategy is defined in solr.xml,
>>>>    use LEGACY
>>>>    - If forceDefaultLegacyPlacementStrategy is not defined in solr.xml,
>>>>    use AffinityPlacementFactory
>>>>
>>>> I like it!
>>>>
>>>> Ilan
>>>>
>>>> On Thu, Apr 29, 2021 at 5:23 PM Jan Høydahl <
>>>> [email protected]> wrote:
>>>>
>>>>> Bringing over a discussion from Slack
>>>>> <https://the-asf.slack.com/archives/CEKUCUNE9/p1619692977151000>
>>>>>
>>>>> In 9.0, the old Autoscaling is gone, and instead we have cluster level
>>>>> "Placement Plugins", see
>>>>> https://nightlies.apache.org/Solr/Solr-reference-guide-main/replica-placement-plugins.html
>>>>>
>>>>> The default behavour on main branch now is "Legacy", described like
>>>>> this in ref-guide:
>>>>>
>>>>> Legacy placement simply assigns new replicas to live nodes in a
>>>>> round-robin fashion: first it prepares a sorted list of nodes with the
>>>>> smallest number of existing replicas of the collection. Then for each 
>>>>> shard
>>>>> in the request it adds the replicas to consecutive nodes in this order,
>>>>> wrapping around to the first node if the number of replicas is larger than
>>>>> the number of nodes.
>>>>> This placement strategy doesn’t ensure that no more than 1 replica of
>>>>> a shard is placed on the same node. Also, the round-robin assignment only
>>>>> roughly approximates an even spread of replicas across the nodes.
>>>>>
>>>>>
>>>>> From the Slack discussion there seems to be a willingness to default
>>>>> to one of the brand new placement plugins, the AffinityPlacementFactory,
>>>>> which is described as
>>>>>
>>>>> This plugin implements replica placement algorithm that roughly
>>>>> replicates this Solr 8.x autoscaling configuration defined here
>>>>> <https://github.com/lucidworks/fusion-cloud-native/blob/master/policy.json#L16>
>>>>> :
>>>>>
>>>>>
>>>>> The autoscaling specification in the configuration linked above aimed
>>>>> to do the following:
>>>>>
>>>>>    - spread replicas per shard as evenly as possible across multiple
>>>>>    availability zones (given by a system property),
>>>>>    - assign replicas based on replica type to specific kinds of nodes
>>>>>    (another system property), and
>>>>>    - avoid having more than one replica per shard on the same node.
>>>>>    - only after the above constraints are satisfied:
>>>>>       - minimize cores per node, or
>>>>>       - minimize disk usage.
>>>>>
>>>>>
>>>>> So the proposal is to make an instance of AffinityPlacementFactory the
>>>>> default, with some universally sane defaults for config - either 
>>>>> configured
>>>>> in the default solr.xml or in java code.
>>>>>
>>>>> We can make the formal decision in this email thread - by lazy
>>>>> consensus.
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Proposal to use AffinityPlacementFactory as default in 9.0

Reply via email to