Re: First class support for node roles

Ishan Chattopadhyaya Mon, 01 Nov 2021 09:36:51 -0700

In this SIP, I'm mainly trying to introduce the broader concept of first
class roles and mechanism to assign roles to nodes. I'm then advocating
that the implementation of each role be left to decide how to use those
roles.


> Roles should never change without restart. 1-100 should not suddenly
become ineligible when you start 101.
> Certainly that leaves you with a situation where at least briefly the
overseer is on an invalid node. Keep in mind
> none of this takes effect until they upgrade in the first place.

Here, we already have a role called "overseer" which basically designates
some nodes to be "preferred overseers". I understand that this could have
been implemented differently, but that is a discussion of the past. And in
future, we can change the implementation of the overseer role. However, the
specific handling of the overseer role falls under the scope of the
overseer implementation, and I don't think our broader discussion of this
SIP needs to resolve this right here and right now.

> Net new functionally provided by a newly added role, (which zookeeper
will be) should be off by default. This however is controlled by the
startup routine, In code we may need to list back-compat roles separately
from new roles,

This is why I was advocating that 1) we assume the "data" as a default, 2)
not assume overseer to be implicitly defined (because of the way overseer
role is written today), 3) not assume any future roles to be true by
default.


On Mon, Nov 1, 2021 at 9:58 PM Ishan Chattopadhyaya <
[email protected]> wrote:

> > This person has not read the documentation we certainly will provide.
> Docs would clearly state that the correct way to achieve this is
> > Solr1-100:  -Dnode.roles=data
> > Solr101: -Dnode.roles=overseer
>
> Yes, and that would need the user to restart all 100 nodes in the cluster.
> Totally avoidable, esp. considering such operations become necessary during
> production outages.
>
>
> On Mon, Nov 1, 2021 at 9:53 PM Gus Heck <[email protected]> wrote:
>
>>
>>
>> On Mon, Nov 1, 2021 at 12:00 PM Ishan Chattopadhyaya <
>> [email protected]> wrote:
>>
>>> > Ilan: A node not having node.roles defined should be assumed to have
>>> all roles. Not only data. I don't see a reason to special case this one or
>>> any role.
>>> > Gus: There should be no "assumptions" Nothing to figure out. A node
>>> has a role or not. For back compatibility reasons, all roles would be
>>> assumed on startup if none specified.
>>> > Jan: No role == all roles. Explicit list of roles = exactly those
>>> roles.
>>>
>>> Problem with this approach is mainly to do with backcompat.
>>>
>>> *1. Overseer backcompat:*
>>> If we don't make any modifications to how overseer works and adopt this
>>> approach (as quoted), then imagine this situation:
>>>
>>> Solr1-100: No roles param (assumed to be "data,overseer").
>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
>>>
>>
>> This person has not read the documentation we certainly will provide.
>> Docs would clearly state that the correct way to achieve this is
>>
>> Solr1-100:  -Dnode.roles=data
>> Solr101: -Dnode.roles=overseer
>>
>>
>>>
>>> User wants this node Solr101 to be a dedicated overseer, but for that to
>>> happen, he/she would need to restart all the data nodes with
>>> -Dnode.roles=data. This will cause unnecessary disruption to running
>>> clusters where a dedicated overseer is needed. Keep in mind, if a user
>>> needs a dedicated overseer, he's likely in an emergency situation and
>>> restarting the whole cluster might not be viable for him/her.
>>>
>>
>> You say it's unnecessary, I disagree. I think it is necessary. Roles
>> should never change without restart. 1-100 should not suddenly become
>> ineligible when you start 101. Certainly that leaves you with a situation
>> where at least briefly the overseer is on an invalid node. Keep in mind
>> none of this takes effect until they upgrade in the first place.
>>
>>
>>>
>>> *2. Future roles might not be compatible with this "assumed to have all
>>> roles" idea:*
>>> Take the proposed "zookeeper" role for example. Today, regular nodes are
>>> not supposed to have embedded ZK running on them. By introducing this
>>> artificial limitation ("assumed to have all roles"), we constrain adoption
>>> of all future roles to necessarily require a full cluster restart.
>>>
>>> Keep in mind newer Solr versions can introduce new capabilities and
>>> roles. Imagine we have a role that is defined in a new Solr version (and
>>> there's functionality to go with that role), and user upgrades to that
>>> version. However, his/her nodes all were started with no node.roles param.
>>> Hence, if those nodes are "assumed to have all roles", then just by virtue
>>> of upgrading to this new version, new capabilities will be turned on for
>>> the entire cluster, whether or not the user opted for such a capability.
>>> This is totally undesirable.
>>>
>>>
>> Yes, good point I didn't express that I was thinking of current
>> functionality. Net new functionally provided by a newly added role, (which
>> zookeeper will be) should be off by default. This however is controlled by
>> the startup routine, In code we may need to list back-compat roles
>> separately from new roles, but that distinction should be isolated to the
>> startup routines however, and nodes should clearly express the final result
>> of startup in a manner indistinguishable from manually assigned roles. Also
>> if we want to associate a currently disabled by default functionality with
>> a role that needs to default to off.
>>
>>
>>> > Gus: I actually don't want a coordinator to do more work, I would
>>> prefer small focused roles with names that accurately describe their
>>> function. In that light, COORDINATOR might be too nebulous. How about
>>> AGREGATOR role? (what I was thinking of would better be called a
>>> QUERY_ANALYSIS role)
>>>
>>> If you want to do specific things like query analysis or query
>>> aggregation or bulk indexing etc, all of those can be done on COORDINATOR
>>> nodes (as is the case in ElasticSearch). Having tens of of " small focused
>>> roles" defined as first class concepts would be confusing to the user. As a
>>> remedy to your situation where you want the coordinator role to also do
>>> query-analysis for shards, one possible solution is to send such a query to
>>> a coordinator node with a parameter like "coordinator.query_analysis=true",
>>> and then the coordinator, instead of blindly hitting remote shards, also
>>> does some extra work on behalf of the shards.
>>>
>>
>> But what if I DON'T want any special treatment of aggregation?
>> Simplifying things to a level simpler than reality just moves the
>> complexity around. In this case it would move complexit from setup to
>> end-user custom code required to to avoid baked-in functionality.
>>
>>
>>>
>>>
>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
>>> [email protected]> wrote:
>>>
>>>> > If we make collections role-aware for example (replicas of that
>>>> collection can only be
>>>> > placed on nodes with a specific role, in addition to the other role
>>>> based constraints),
>>>> > the set of roles should be user extensible and not fixed.
>>>> > If collections are not role aware, the constraints introduced by
>>>> roles apply to all collections
>>>> > equally which might be insufficient if a user needs for example a
>>>> heavily used collection to
>>>> > only be placed on more powerful nodes.
>>>>
>>>> I feel node roles and role-aware collections are orthogonal topics.
>>>> What you describe above can be achieved by the autoscaling+replica
>>>> placement framework where the placement plugins take the node roles as one
>>>> of the inputs.
>>>>
>>>> > It does impact the design from early on: the set of roles need to be
>>>> expandable by a user
>>>> > by creating a collection with new roles for example (consumed by
>>>> placement plugins) and be
>>>> > able to start nodes with new (arbitrary) roles. Should such roles
>>>> follow some naming syntax to
>>>> > differentiate them from built in roles? To be able to fail on typos
>>>> on roles - that otherwise can be
>>>> > crippling and hard to debug. This implies in any case that the
>>>> current design can't assume all
>>>> > roles are known at compile time or define them in a Java enum.
>>>>
>>>> I think this should be achieved by something different from roles.
>>>> Something like node *labels* (user defined) which can then be used in
>>>> a replica placement plugin to assign replicas. I see roles as more closely
>>>> associated with kinds of functionality a node is designated for. Therefore,
>>>> I feel that replica placements and user defined node labels is out of scope
>>>> for this SIP. It can be added later in a separate SIP, without being at
>>>> odds with this proposal.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <[email protected]>:
>>>>> > A node not having node.roles defined should be assumed to have all
>>>>> roles. Not only data. I don't see a reason to special case this one or any
>>>>> role.
>>>>>
>>>>> +1, make it simple and transparent. No role == all roles. Explicit
>>>>> list of roles = exactly those roles.
>>>>>
>>>>> > (Gus) See my comment above, but maybe preference is something
>>>>> handled as a feature of the role rather than via role designation?
>>>>>
>>>>> Yea, we always need an overseer, so that feature can decide to use its
>>>>> list of nodes as a preference if it so chooses.
>>>>>
>>>>>
>>>>> Aside: I think it makes it easier if we always prefix Solr env.vars
>>>>> and sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo. That 
>>>>> way
>>>>> we can get away from having to have explicit code in bin/solr, 
>>>>> bin/solr.cmd
>>>>> and SolrCLI to manage every single property. Instead we can parse all ENVs
>>>>> and Props with the solr prefix in our bootstrap code. And we can by
>>>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and it 
>>>>> would
>>>>> be the same ting...
>>>>>
>>>>> Jan
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: First class support for node roles

Reply via email to