Re: First class support for node roles

Timothy Potter Tue, 02 Nov 2021 17:27:01 -0700

One last thought on this for me ... I think it would be beneficial for
the SIP to address how this new feature will work with the existing
shards.preference solution and affinity based placement plugin.
Moreover, your pseudo-replica solution sounds like a new replica type
vs. a node level thing. The placement strategy can place replicas
based on replica type and node type (just a system property), so
please address why you can't achieve a query coordinator behavior with
a new replica type + improvements to the Affinity placement plugin?


Cheers,
Tim


On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya
<[email protected]> wrote:
>
> Also, in a cluster where new collections/shards/replicas are continuously 
> added all the time, it would be pretty awkward to start a node (in regular 
> mode), briefly have it become eligible for replica assignment, then invoking 
> a replica placement rule/autoscaling policy for that node to not place 
> replicas on it. Instead, starting a node with a defined role (as a startup 
> param) precludes that brief period of eligibility for replica placement on 
> such a node.
>
> On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya 
> <[email protected]> wrote:
>>
>> If we were to tell users how to do "scatter gather on an empty node", *how 
>> exactly* would you recommend users have an empty node to begin with? 
>> Wouldn't you say something like "for 8x you can do this (rule based replica 
>> placement) or do that (autoscaling), but for 9x you do this new thing". 
>> Having a node that doesn't have a data role seems like a consistent and an 
>> elegant way for users to invoke such a functionality and also easily relate 
>> to a broad concept, without having to deal with autoscaling frameworks of 
>> the ancient past, medieval past or the future.
>>
>> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter <[email protected]> wrote:
>>>
>>> As opposed to what? Looking up the configset for the addressed
>>> collection and pulling whatever information it needs from cached data.
>>> I'm sure there are some nuances but I hardly think you need a node
>>> role framework to deal with determine the unique key field to do
>>> scatter gather on an empty node when you have easy access to
>>> collection metadata.
>>>
>>> Doesn't seem like a hard thing to overcome to me.
>>>
>>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter <[email protected]> wrote:
>>> >>
>>> >> I'm not missing the point of the query coordinator, but I actually
>>> >> didn't realize that an empty Solr node would forward the top-level
>>> >> request onward instead of just being the query controller itself? That
>>> >> actually seems like a bug vs. a feature, IMO any node that receives
>>> >> the top-level query should just be the coordinator, what stops it?
>>> >
>>> >
>>> > To process a request there should be a core that uses the same configset 
>>> > as the requested collection.
>>> >>
>>> >>
>>> >> Anyway, it sounds to me like you guys have your minds made up
>>> >> regardless of feedback.
>>> >>
>>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in your SIP as a
>>> >> specific role, not sure why you took that as me wanting to discuss the
>>> >> embedded ZK in your SIP?
>>> >>
>>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya
>>> >> <[email protected]> wrote:
>>> >> >
>>> >> > Hi Tim,
>>> >> > Here are my responses inline.
>>> >> >
>>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter <[email protected]> 
>>> >> > wrote:
>>> >> >>
>>> >> >> I'm just not convinced this feature is even needed and the SIP is not
>>> >> >> convincing that "There is no proper alternative today."
>>> >> >
>>> >> >
>>> >> > There are no proper alternatives today, just hacks. On 8x, we have two 
>>> >> > different deprecated frameworks to stop nodes from being placed on a 
>>> >> > node (1. rule based replica placement, 2. autoscaling framework). On 
>>> >> > 9x, we have a new autoscaling framework, which I don't even think is 
>>> >> > fully implemented. And, there's definitely no way to have a node act 
>>> >> > as a query coordinator without having data on it.
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> 1) Just b/c Elastic and Vespa have a concept of node roles, doesn't
>>> >> >> mean Solr needs this.
>>> >> >
>>> >> >
>>> >> > Solr needs this. Elastic has such concepts is a coincidence, and also 
>>> >> > means we have an opportunity to catch up with them; they have these 
>>> >> > concepts for a reason.
>>> >> >
>>> >> >>
>>> >> >> Also, some of Elastic's roles overlap with
>>> >> >> concepts Solr already has in a different form, i.e data_hot sounds
>>> >> >> like NRT and data_warm sounds a lot like our Pull Replica Type
>>> >> >
>>> >> >
>>> >> > I think that is beyond the scope of this SIP.
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> 2) You can achieve the "coordinator" role with auto-scaling rules
>>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it even has a node
>>> >> >> type built in: 
>>> >> >> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP).
>>> >> >> Simply build your replica placement rules such that no replicas land
>>> >> >> on "coordinator" nodes. And you can route queries using node.sysprop
>>> >> >> already using shards.preference.
>>> >> >
>>> >> >
>>> >> > I think you missed the whole point of the query coordinator. Please 
>>> >> > refer to this https://issues.apache.org/jira/browse/SOLR-15715.
>>> >> > Let me summarize the main difference between what (I think) you refer 
>>> >> > to and what is proposed in SOLR-15715.
>>> >> >
>>> >> > With your suggestion, we'll have a node that doesn't host any 
>>> >> > replicas. And you suggest queries landing on such nodes be routed 
>>> >> > using shards.preference? Well, in such a case, these queries will be 
>>> >> > forwarded/proxied to a random node hosting a replica of the collection 
>>> >> > and that node then acts as the coordinator. This situation is no 
>>> >> > better than sending the query directly to that particular node.
>>> >> >
>>> >> > What is proposed in SOLR-15715 is a query aggregation functionality. 
>>> >> > There will be pseudo replicas (aware of the configset) on this 
>>> >> > coordinator node that handle the request themselves, sends shard 
>>> >> > requests to data hosting replicas, collects responses and merges them, 
>>> >> > and sends back to the user. This merge step is usually extremely 
>>> >> > memory intensive, and it would be good to serve these off stateless 
>>> >> > nodes (that host no data).
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> 3) Dedicated overseer role? I thought we were removing the overseer?!?
>>> >> >> Also, we already have the ability to run the overseer on specific
>>> >> >> nodes w/o a new framework, so this doesn't really convince me we need
>>> >> >> a new framework.
>>> >> >
>>> >> >
>>> >> > There's absolutely no change proposed to the "overseer" role. What 
>>> >> > users need on production clusters are nodes dedicated for overseer 
>>> >> > operations, and for that the current "overseer" role suffices, 
>>> >> > together with some functionality to not place replicas on such nodes.
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> 4) We will indeed need to decide which nodes host embedded Zookeeper's
>>> >> >> but I'd argue that solution hasn't been designed entirely and we
>>> >> >> probably don't need a formal node role framework to determine which
>>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems more like a small
>>> >> >> cluster thing and anyone running a large cluster will probably have a
>>> >> >> dedicated ZK ensemble as they do today. The node role thing seems like
>>> >> >> it's intended for large clusters and my gut says few will use embedded
>>> >> >> ZK for large clusters.
>>> >> >
>>> >> >
>>> >> > This SIP is not the right place for this discussion. There's a 
>>> >> > separate SIP for this.
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> 5) You can also achieve a lot of "node role" functionality in query
>>> >> >> routing using the shards.preference parameter.
>>> >> >>
>>> >> >
>>> >> > That doesn't solve the purpose behind 
>>> >> > https://issues.apache.org/jira/browse/SOLR-15715.
>>> >> >
>>> >> >>
>>> >> >> At the very least, the SIP needs to list specific use cases that
>>> >> >> require this feature that are not achievable with the current features
>>> >> >> before getting bogged down in the impl. details.
>>> >> >
>>> >> >
>>> >> > The coordinator role is the biggest motivation for introducing the 
>>> >> > concept of roles. However, in addition to what is proposed in 
>>> >> > SOLR-15715, a coordinator node can later on also be used as a node for 
>>> >> > users to run streaming expressions on, do bulk indexing on (impl 
>>> >> > details for this to come later, don't want distraction here).
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> Tim
>>> >> >>
>>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <[email protected]> wrote:
>>> >> >> >
>>> >> >> > I think there are things not yet accounted for. Time I spent 
>>> >> >> > yesterday is biting me today. Pls give a couple days.
>>> >> >> >
>>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski 
>>> >> >> > <[email protected]> wrote:
>>> >> >> >>
>>> >> >> >> Hey Ishan,
>>> >> >> >>
>>> >> >> >> I appreciate you writing up the SIP!  Here's some notes/questions I
>>> >> >> >> had as I was reading through your writeup and this mail thread.
>>> >> >> >> ("----" separators between thoughts, hopefully that helps.)
>>> >> >> >>
>>> >> >> >> ----
>>> >> >> >>
>>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and Houston already
>>> >> >> >> suggested: roles should default to "all-on".  I see the downsides
>>> >> >> >> you're worried about with that approach (esp. around 'overseer'), 
>>> >> >> >> but
>>> >> >> >> they may be mitigatable, at least in part.
>>> >> >> >>
>>> >> >> >> > [mail thread] User wants this node Solr101 to be a dedicated 
>>> >> >> >> > overseer, but for that to happen, he/she would need to restart 
>>> >> >> >> > all the data nodes with -Dnode.roles=data
>>> >> >> >>
>>> >> >> >> Sure, if roles can only be specified at startup.  But that may be a
>>> >> >> >> self-imposed constraint.
>>> >> >> >>
>>> >> >> >> An API to change a node's roles would remove the need for a restart
>>> >> >> >> and make it easy for users to affect the semantics they want.  You
>>> >> >> >> decided you want a dedicated overseer N nodes into your cluster
>>> >> >> >> deployment?  Deploy node 'N' with the 'overseer', and toggle the
>>> >> >> >> overseer role off on the remainder.
>>> >> >> >>
>>> >> >> >> Now, I understand that you don't want roles to change at runtime, 
>>> >> >> >> but
>>> >> >> >> I haven't seen you get much into "why", beyond saying "it is very
>>> >> >> >> risky to have nodes change roles while they are up and running."  
>>> >> >> >> Can
>>> >> >> >> you expand a bit on the risks you're worried about?  If you're
>>> >> >> >> explicit about them here maybe someone can think of a clever way to
>>> >> >> >> address them?
>>> >> >> >>
>>> >> >> >> > Hence, if those nodes are "assumed to have all roles", then just 
>>> >> >> >> > by virtue of upgrading to this new version, new capabilities 
>>> >> >> >> > will be turned on for the entire cluster, whether or not the 
>>> >> >> >> > user opted for such a capability. This is totally undesirable.
>>> >> >> >>
>>> >> >> >> Obviously "roles" refer to much bigger chunks of functionality than
>>> >> >> >> usual, so in a sense defaulting roles on is scarier.  But in a 
>>> >> >> >> sense
>>> >> >> >> you're describing something that's an inherent part of software
>>> >> >> >> releases.  Releases expose new features that are typically on by
>>> >> >> >> default.  A new default-on role in 9.1 might hurt a user, but 
>>> >> >> >> there's
>>> >> >> >> no fundamental difference between that and a change to backups or
>>> >> >> >> replication or whatever in the same release.
>>> >> >> >>
>>> >> >> >> I don't mean to belittle the difference in scope - I get your 
>>> >> >> >> concern.
>>> >> >> >> But IMO this is something to address with good release notes and
>>> >> >> >> documentation.  Designing for admins who don't do even cursory
>>> >> >> >> research before an upgrade ties both our hands behind our back as a
>>> >> >> >> project.
>>> >> >> >>
>>> >> >> >> ----
>>> >> >> >>
>>> >> >> >> > [SIP] Internal representation in ZK ... Implementation details 
>>> >> >> >> > like these can be fleshed out in the PR
>>> >> >> >>
>>> >> >> >> IMO this is important enough to flush out as part of the SIP, at 
>>> >> >> >> least
>>> >> >> >> in broad strokes.  It affects backcompat, SolrJ client design, etc.
>>> >> >> >>
>>> >> >> >> ----
>>> >> >> >>
>>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1
>>> >> >> >>
>>> >> >> >> Woohoo - way to include a v2 API definition!
>>> >> >> >>
>>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I wonder whether "GET
>>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive endpoint for 
>>> >> >> >> the
>>> >> >> >> "get the roles this node has" functionality.  Though I leave that 
>>> >> >> >> for
>>> >> >> >> your consideration.
>>> >> >> >>
>>> >> >> >> ----
>>> >> >> >>
>>> >> >> >> Looking forward to your responses and seeing the SIP progress!  
>>> >> >> >> It's a
>>> >> >> >> really cool, promising idea IMO.
>>> >> >> >>
>>> >> >> >> Best,
>>> >> >> >>
>>> >> >> >> Jason
>>> >> >> >>
>>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
>>> >> >> >> <[email protected]> wrote:
>>> >> >> >> >
>>> >> >> >> > Are there any unaddressed outstanding concerns that we should 
>>> >> >> >> > hold up the SIP for?
>>> >> >> >> >
>>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, 
>>> >> >> >> > <[email protected]> wrote:
>>> >> >> >> >>>
>>> >> >> >> >>> >> Agree. However, I disagree with ideas where "query 
>>> >> >> >> >>> >> analysis" has a role of its own. Where would that lead us 
>>> >> >> >> >>> >> to? Separate roles for
>>> >> >> >> >>>
>>> >> >> >> >>> >> nodes that do "faceting" or "spell correction" etc.? But 
>>> >> >> >> >>> >> anyway, that is for discussion when we add future roles. 
>>> >> >> >> >>> >> This is beyond this SIP.
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> > I am not asking you to implement every possible role of 
>>> >> >> >> >> > course :). As a note I know a company that is running an 
>>> >> >> >> >> > entire separate
>>> >> >> >> >> > cluster to offload and better serve highlighting on a subset 
>>> >> >> >> >> > of large docs, so YES I think there are people who may want 
>>> >> >> >> >> > such fine grained control.
>>> >> >> >> >>
>>> >> >> >> >> Cool, I think we can discuss adding any additional roles (for 
>>> >> >> >> >> highlighting?) on a case by case basis at a later point.
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya 
>>> >> >> >> >> <[email protected]> wrote:
>>> >> >> >> >>>
>>> >> >> >> >>> > Boiling it down the idea I'm proposing is that roles 
>>> >> >> >> >>> > required for back compatibility get explicitly added on 
>>> >> >> >> >>> > startup, if not by the user then by the code. This is more 
>>> >> >> >> >>> > flexible than assuming that no role means every role, 
>>> >> >> >> >>> > because then every new feature that has a role will end up 
>>> >> >> >> >>> > on legacy clusters which are also not back compatible.
>>> >> >> >> >>>
>>> >> >> >> >>> +1, I totally agree. I even said so, when I said: "This is why 
>>> >> >> >> >>> I was advocating that 1) we assume the "data" as a default, 2) 
>>> >> >> >> >>> not assume overseer to be implicitly defined (because of the 
>>> >> >> >> >>> way overseer role is written today), 3) not assume any future 
>>> >> >> >> >>> roles to be true by default."
>>> >> >> >> >>>
>>> >> >> >> >>> So, basically, I'm proposing that the "roles required for back 
>>> >> >> >> >>> compatibility" (that should be explicitly added on startup) be 
>>> >> >> >> >>> just the ["data"] role, and not the "overseer" role (due to 
>>> >> >> >> >>> the way overseer role is currently defined, i.e. it is 
>>> >> >> >> >>> "preferred overseer").
>>> >> >> >> >>>
>>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <[email protected]> 
>>> >> >> >> >>> wrote:
>>> >> >> >> >>>>
>>> >> >> >> >>>> Very sorry don't mean to sound offended, Frustrated yes 
>>> >> >> >> >>>> offended no :)... the most difficult thing about 
>>> >> >> >> >>>> communication is the illusion it has occurred :)
>>> >> >> >> >>>>
>>> >> >> >> >>>> If you read back just a few emails you'll see where I talk 
>>> >> >> >> >>>> about roles being applied on startup. Boiling it down the 
>>> >> >> >> >>>> idea I'm proposing is that roles required for back 
>>> >> >> >> >>>> compatibility get explicitly added on startup, if not by the 
>>> >> >> >> >>>> user then by the code. This is more flexible than assuming 
>>> >> >> >> >>>> that no role means every role, because then every new feature 
>>> >> >> >> >>>> that has a role will end up on legacy clusters which are also 
>>> >> >> >> >>>> not back compatible.
>>> >> >> >> >>>>
>>> >> >> >> >>>> There are points where I said all roles rather than back 
>>> >> >> >> >>>> compatibility roles because I was thinking about back 
>>> >> >> >> >>>> compatibility specifically, but you can't know that if I 
>>> >> >> >> >>>> don't say that can you :).
>>> >> >> >> >>>>
>>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya 
>>> >> >> >> >>>> <[email protected]> wrote:
>>> >> >> >> >>>>>
>>> >> >> >> >>>>> > If you read more closely, my way can provide full back 
>>> >> >> >> >>>>> > compatibility. To say or imply it doesn't isn't helping. 
>>> >> >> >> >>>>> > Perhaps you need to re-read?
>>> >> >> >> >>>>>
>>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm trying my 
>>> >> >> >> >>>>> best. Please don't be offended, and kindly point me to the 
>>> >> >> >> >>>>> exact part you want me to re-read.
>>> >> >> >> >>>>>
>>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck 
>>> >> >> >> >>>>> <[email protected]> wrote:
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya 
>>> >> >> >> >>>>>> <[email protected]> wrote:
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Positive - They denote the existence of a capability
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Agree, the SIP already reflects this.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >   Absolute - Absence/Presence binary identification of a 
>>> >> >> >> >>>>>>> > capability; no implications, no assumptions
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Disagree, we need backcompat handling on nodes running 
>>> >> >> >> >>>>>>> without any roles. There has to be an implicit assumption 
>>> >> >> >> >>>>>>> as to what roles are those nodes assumed to have. My 
>>> >> >> >> >>>>>>> proposal is that only the "data" role be assumed, but not 
>>> >> >> >> >>>>>>> the "overseer" role. For any future roles ("coordinator", 
>>> >> >> >> >>>>>>> "zookeeper" etc.), this decision as to what absence of any 
>>> >> >> >> >>>>>>> role implies should be left to the implementation of that 
>>> >> >> >> >>>>>>> future role. Documentation should reflect clearly about 
>>> >> >> >> >>>>>>> these implicit assumptions.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> If you read more closely, my way can provide full back 
>>> >> >> >> >>>>>> compatibility. To say or imply it doesn't isn't helping. 
>>> >> >> >> >>>>>> Perhaps you need to re-read?
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Focused - Do one thing per role
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where "query 
>>> >> >> >> >>>>>>> analysis" has a role of its own. Where would that lead us 
>>> >> >> >> >>>>>>> to? Separate roles for nodes that do "faceting" or "spell 
>>> >> >> >> >>>>>>> correction" etc.? But anyway, that is for discussion when 
>>> >> >> >> >>>>>>> we add future roles. This is beyond this SIP.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> I am not asking you to implement every possible role of 
>>> >> >> >> >>>>>> course :). As a note I know a company that is running an 
>>> >> >> >> >>>>>> entire separate cluster to offload and better serve 
>>> >> >> >> >>>>>> highlighting on a subset of large docs, so YES I think 
>>> >> >> >> >>>>>> there are people who may want such fine grained control.
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Accessible - It should be dead simple to determine 
>>> >> >> >> >>>>>>> > the members of a role, avoid parsing blobs of json, 
>>> >> >> >> >>>>>>> > avoid calculating implications, avoid consulting other 
>>> >> >> >> >>>>>>> > resources after listing nodes with the role
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Agree. I'm open to any implementation details that make it 
>>> >> >> >> >>>>>>> easy. There should be a reasonable API to return these 
>>> >> >> >> >>>>>>> node roles, with ability to filter by role or filter by 
>>> >> >> >> >>>>>>> node.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Independent - One role should not require other roles 
>>> >> >> >> >>>>>>> > to be present
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Do we need to have this hard and fast requirement upfront? 
>>> >> >> >> >>>>>>> There might be situations where this is desirable. I feel 
>>> >> >> >> >>>>>>> we can discuss on a case by case basis whenever a future 
>>> >> >> >> >>>>>>> role is added.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Persistent - roles should not be lost across reboot
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Agree.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Immutable - roles should not change while the node is 
>>> >> >> >> >>>>>>> > running
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> Agree
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> >    Lively - A node with a capability may not be 
>>> >> >> >> >>>>>>> > presently providing that capability.
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> I don't understand, can you please elaborate?
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> Specifically imagine the case where there are 100 nodes:
>>> >> >> >> >>>>>> 1-100 ==> DATA
>>> >> >> >> >>>>>> 101-103 ==> OVERSEER
>>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> But you won't have 3 overseers... you'll want only one of 
>>> >> >> >> >>>>>> those to be providing overseer functionality and the other 
>>> >> >> >> >>>>>> two to be capable, but not providing (so that if the 
>>> >> >> >> >>>>>> current overseer goes down a new one can be assigned).
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 
>>> >> >> >> >>>>>> 107-108 with that role, but you probably want to ensure 
>>> >> >> >> >>>>>> that zookeepers require some sort of command for them to 
>>> >> >> >> >>>>>> actually join the zookeeper cluster (i.e. 
>>> >> >> >> >>>>>> /admin?action=ZKADD&nodes=node107,node18) ... to do that 
>>> >> >> >> >>>>>> the nodes need to be up. But oh look I typoed 108... we 
>>> >> >> >> >>>>>> want that to fail... how? because 18 does not have the 
>>> >> >> >> >>>>>> capability to become a zookeeper.
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>>
>>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya 
>>> >> >> >> >>>>>>> <[email protected]> wrote:
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles defined should be 
>>> >> >> >> >>>>>>>> > assumed to have all roles. Not only data. I don't see a 
>>> >> >> >> >>>>>>>> > reason to special case this one or any role.
>>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions" Nothing to figure 
>>> >> >> >> >>>>>>>> > out. A node has a role or not. For back compatibility 
>>> >> >> >> >>>>>>>> > reasons, all roles would be assumed on startup if none 
>>> >> >> >> >>>>>>>> > specified.
>>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list of roles = 
>>> >> >> >> >>>>>>>> > exactly those roles.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> Problem with this approach is mainly to do with 
>>> >> >> >> >>>>>>>> backcompat.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> 1. Overseer backcompat:
>>> >> >> >> >>>>>>>> If we don't make any modifications to how overseer works 
>>> >> >> >> >>>>>>>> and adopt this approach (as quoted), then imagine this 
>>> >> >> >> >>>>>>>> situation:
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be "data,overseer").
>>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated 
>>> >> >> >> >>>>>>>> overseer)
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> User wants this node Solr101 to be a dedicated overseer, 
>>> >> >> >> >>>>>>>> but for that to happen, he/she would need to restart all 
>>> >> >> >> >>>>>>>> the data nodes with -Dnode.roles=data. This will cause 
>>> >> >> >> >>>>>>>> unnecessary disruption to running clusters where a 
>>> >> >> >> >>>>>>>> dedicated overseer is needed. Keep in mind, if a user 
>>> >> >> >> >>>>>>>> needs a dedicated overseer, he's likely in an emergency 
>>> >> >> >> >>>>>>>> situation and restarting the whole cluster might not be 
>>> >> >> >> >>>>>>>> viable for him/her.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> 2. Future roles might not be compatible with this 
>>> >> >> >> >>>>>>>> "assumed to have all roles" idea:
>>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for example. Today, 
>>> >> >> >> >>>>>>>> regular nodes are not supposed to have embedded ZK 
>>> >> >> >> >>>>>>>> running on them. By introducing this artificial 
>>> >> >> >> >>>>>>>> limitation ("assumed to have all roles"), we constrain 
>>> >> >> >> >>>>>>>> adoption of all future roles to necessarily require a 
>>> >> >> >> >>>>>>>> full cluster restart.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can introduce new 
>>> >> >> >> >>>>>>>> capabilities and roles. Imagine we have a role that is 
>>> >> >> >> >>>>>>>> defined in a new Solr version (and there's functionality 
>>> >> >> >> >>>>>>>> to go with that role), and user upgrades to that version. 
>>> >> >> >> >>>>>>>> However, his/her nodes all were started with no 
>>> >> >> >> >>>>>>>> node.roles param. Hence, if those nodes are "assumed to 
>>> >> >> >> >>>>>>>> have all roles", then just by virtue of upgrading to this 
>>> >> >> >> >>>>>>>> new version, new capabilities will be turned on for the 
>>> >> >> >> >>>>>>>> entire cluster, whether or not the user opted for such a 
>>> >> >> >> >>>>>>>> capability. This is totally undesirable.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator to do more 
>>> >> >> >> >>>>>>>> > work, I would prefer small focused roles with names 
>>> >> >> >> >>>>>>>> > that accurately describe their function. In that light, 
>>> >> >> >> >>>>>>>> > COORDINATOR might be too nebulous. How about AGREGATOR 
>>> >> >> >> >>>>>>>> > role? (what I was thinking of would better be called a 
>>> >> >> >> >>>>>>>> > QUERY_ANALYSIS role)
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> If you want to do specific things like query analysis or 
>>> >> >> >> >>>>>>>> query aggregation or bulk indexing etc, all of those can 
>>> >> >> >> >>>>>>>> be done on COORDINATOR nodes (as is the case in 
>>> >> >> >> >>>>>>>> ElasticSearch). Having tens of of " small focused roles" 
>>> >> >> >> >>>>>>>> defined as first class concepts would be confusing to the 
>>> >> >> >> >>>>>>>> user. As a remedy to your situation where you want the 
>>> >> >> >> >>>>>>>> coordinator role to also do query-analysis for shards, 
>>> >> >> >> >>>>>>>> one possible solution is to send such a query to a 
>>> >> >> >> >>>>>>>> coordinator node with a parameter like 
>>> >> >> >> >>>>>>>> "coordinator.query_analysis=true", and then the 
>>> >> >> >> >>>>>>>> coordinator, instead of blindly hitting remote shards, 
>>> >> >> >> >>>>>>>> also does some extra work on behalf of the shards.
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>>
>>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya 
>>> >> >> >> >>>>>>>> <[email protected]> wrote:
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>> > If we make collections role-aware for example 
>>> >> >> >> >>>>>>>>> > (replicas of that collection can only be
>>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, in addition to 
>>> >> >> >> >>>>>>>>> > the other role based constraints),
>>> >> >> >> >>>>>>>>> > the set of roles should be user extensible and not 
>>> >> >> >> >>>>>>>>> > fixed.
>>> >> >> >> >>>>>>>>> > If collections are not role aware, the constraints 
>>> >> >> >> >>>>>>>>> > introduced by roles apply to all collections
>>> >> >> >> >>>>>>>>> > equally which might be insufficient if a user needs 
>>> >> >> >> >>>>>>>>> > for example a heavily used collection to
>>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes.
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>> I feel node roles and role-aware collections are 
>>> >> >> >> >>>>>>>>> orthogonal topics. What you describe above can be 
>>> >> >> >> >>>>>>>>> achieved by the autoscaling+replica placement framework 
>>> >> >> >> >>>>>>>>> where the placement plugins take the node roles as one 
>>> >> >> >> >>>>>>>>> of the inputs.
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>> > It does impact the design from early on: the set of 
>>> >> >> >> >>>>>>>>> > roles need to be expandable by a user
>>> >> >> >> >>>>>>>>> > by creating a collection with new roles for example 
>>> >> >> >> >>>>>>>>> > (consumed by placement plugins) and be
>>> >> >> >> >>>>>>>>> > able to start nodes with new (arbitrary) roles. Should 
>>> >> >> >> >>>>>>>>> > such roles follow some naming syntax to
>>> >> >> >> >>>>>>>>> > differentiate them from built in roles? To be able to 
>>> >> >> >> >>>>>>>>> > fail on typos on roles - that otherwise can be
>>> >> >> >> >>>>>>>>> > crippling and hard to debug. This implies in any case 
>>> >> >> >> >>>>>>>>> > that the current design can't assume all
>>> >> >> >> >>>>>>>>> > roles are known at compile time or define them in a 
>>> >> >> >> >>>>>>>>> > Java enum.
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>> I think this should be achieved by something different 
>>> >> >> >> >>>>>>>>> from roles. Something like node labels (user defined) 
>>> >> >> >> >>>>>>>>> which can then be used in a replica placement plugin to 
>>> >> >> >> >>>>>>>>> assign replicas. I see roles as more closely associated 
>>> >> >> >> >>>>>>>>> with kinds of functionality a node is designated for. 
>>> >> >> >> >>>>>>>>> Therefore, I feel that replica placements and user 
>>> >> >> >> >>>>>>>>> defined node labels is out of scope for this SIP. It can 
>>> >> >> >> >>>>>>>>> be added later in a separate SIP, without being at odds 
>>> >> >> >> >>>>>>>>> with this proposal.
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>>
>>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl 
>>> >> >> >> >>>>>>>>> <[email protected]> wrote:
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg 
>>> >> >> >> >>>>>>>>>> > <[email protected]>:
>>> >> >> >> >>>>>>>>>> > A node not having node.roles defined should be 
>>> >> >> >> >>>>>>>>>> > assumed to have all roles. Not only data. I don't see 
>>> >> >> >> >>>>>>>>>> > a reason to special case this one or any role.
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No role == all 
>>> >> >> >> >>>>>>>>>> roles. Explicit list of roles = exactly those roles.
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe preference is 
>>> >> >> >> >>>>>>>>>> > something handled as a feature of the role rather 
>>> >> >> >> >>>>>>>>>> > than via role designation?
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that feature can 
>>> >> >> >> >>>>>>>>>> decide to use its list of nodes as a preference if it 
>>> >> >> >> >>>>>>>>>> so chooses.
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we always prefix 
>>> >> >> >> >>>>>>>>>> Solr env.vars and sys.props with "SOLR_" or "solr.", 
>>> >> >> >> >>>>>>>>>> i.e. -Dsolr.node.roles=foo. That way we can get away 
>>> >> >> >> >>>>>>>>>> from having to have explicit code in bin/solr, 
>>> >> >> >> >>>>>>>>>> bin/solr.cmd and SolrCLI to manage every single 
>>> >> >> >> >>>>>>>>>> property. Instead we can parse all ENVs and Props with 
>>> >> >> >> >>>>>>>>>> the solr prefix in our bootstrap code. And we can by 
>>> >> >> >> >>>>>>>>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo 
>>> >> >> >> >>>>>>>>>> solr:9 and it would be the same ting...
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>>>>> Jan
>>> >> >> >> >>>>>>>>>> ---------------------------------------------------------------------
>>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>> >> >> >> >>>>>>>>>> For additional commands, e-mail: 
>>> >> >> >> >>>>>>>>>> [email protected]
>>> >> >> >> >>>>>>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>>
>>> >> >> >> >>>>>> --
>>> >> >> >> >>>>>> http://www.needhamsoftware.com (work)
>>> >> >> >> >>>>>> http://www.the111shift.com (play)
>>> >> >> >> >>>>
>>> >> >> >> >>>>
>>> >> >> >> >>>>
>>> >> >> >> >>>> --
>>> >> >> >> >>>> http://www.needhamsoftware.com (work)
>>> >> >> >> >>>> http://www.the111shift.com (play)
>>> >> >> >>
>>> >> >> >> ---------------------------------------------------------------------
>>> >> >> >> To unsubscribe, e-mail: [email protected]
>>> >> >> >> For additional commands, e-mail: [email protected]
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > http://www.needhamsoftware.com (work)
>>> >> >> > http://www.the111shift.com (play)
>>> >> >>
>>> >> >> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: [email protected]
>>> >> >> For additional commands, e-mail: [email protected]
>>> >> >>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [email protected]
>>> >> For additional commands, e-mail: [email protected]
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: First class support for node roles

Reply via email to