Re: First class support for node roles

Noble Paul Thu, 04 Nov 2021 13:00:36 -0700

Yes Ilan
The coordinator is the first compelling usecase. The roles is the UX and
it's a very simple piece. The real work is coming as a separate PR.


Roles can be achieved in a clumsy way today. It's unintuitive and we don't
want to make the user to jump through the hoops.

I'll open a PR and you be the judge on the simplicity of  this SIP. It's
not going to have any major impact on any component of Solr.



On Fri, Nov 5, 2021, 2:01 AM Ilan Ginzburg <[email protected]> wrote:

> I was noting that the real value of the proposal (real value = being able
> to do things that are currently impossible with Solr) was due to an
> independent concept of a coordinator "core", and that if we had this
> (currently does not exist in Solr but apparently you do have it on a fork),
> we can achieve most/all of what the SIP proposes with existing means, i.e.
> without roles. Maybe in a less flexible/user friendly way, maybe not (given
> the details of rolling out roles are still fuzzy).
> And if we don't have the concept of coordinator core, then the roles by
> themselves do not allow much more than what is already achievable by other
> means.
>
> Ilan
>
> On Thu, Nov 4, 2021 at 12:02 PM Noble Paul <[email protected]> wrote:
>
>> The placement part of roles feature may use placement plugin API .
>>
>>
>>  The implementation is not what we're discussing here. We need a
>> consistent story for the user when it comes to roles. This discussion is
>> about the UX rather than the impl.
>>
>> Most of our discussions are about how we should implement it
>>
>>
>>
>> On Thu, Nov 4, 2021, 9:27 PM Ilan Ginzburg <[email protected]> wrote:
>>
>>> A lot of the value of this SIP relies on the pseudo-core thing (because
>>> placing on specific nodes is achievable today, Overseer role already
>>> exists). Roles as described without the coordinator concept are just
>>> another way to do things already possible today (with a very minor update
>>> on the Affinity placement plugin - it might even support it right away
>>> actually, didn't check).
>>> Maybe "pseudo core" should go in first and condition the rest of the
>>> work? It feels like a bigger chunk with more challenging integration issues
>>> (routing, new concept in the collection/shard/replica hierarchy).
>>>
>>> Ilan
>>>
>>> On Thu, Nov 4, 2021 at 11:20 AM Noble Paul <[email protected]> wrote:
>>>
>>>> None of the design is dictated by the version in which we implement
>>>> this. The SIP is mostly about the "what", "why" and the UX
>>>>
>>>> I don't have any affinity to any particular version. This is definitely
>>>> going to happen in 9.x. Even if it is built in 9.x we will have to build
>>>> and support all versions of solr we use internally. When we eventually
>>>> upgrade from our current version to a 9.x version , it has to be backward
>>>> compatible.The choice of whether this is available for public consumption
>>>> as a branch/release is up for debate
>>>>
>>>> On Thu, Nov 4, 2021, 8:28 PM Jan Høydahl <[email protected]> wrote:
>>>>
>>>>> Let's do ourself a service and target 9.0 for roles. It's too late to
>>>>> plan new features into 8.x.
>>>>>
>>>>> I don't understand the urgency either. I can get that certain Solr
>>>>> users would wish for such a feature "yesterday" but that cannot drive our
>>>>> decisions on what version to target for features. When targeting 9.0, all
>>>>> upgrade or back-compat worries will need to be baked into the feature
>>>>> itself, so that there is either code support or good documentation for how
>>>>> to start using roles after upgrading a cluster to 9.0. Perhaps there must
>>>>> be a temporary cluster-property in 9.0 "enableRoles=false" that can be 
>>>>> set,
>>>>> even if all 9.0 nodes are given roles on startup. Then, initially after 
>>>>> the
>>>>> upgrade, the cluster behaves as it did in 8.x. Then once you are ready to
>>>>> enforce roles, you can flip the cluster property, and placement and 
>>>>> routing
>>>>> starts using roles. In 10.0 that property can then go away.
>>>>>
>>>>> When it comes to placement plugins, we can document in that they MUST
>>>>> respect certain node roles (at least the data role), and treat it as a bug
>>>>> if they don't.
>>>>>
>>>>> Jan
>>>>>
>>>>> 4. nov. 2021 kl. 03:36 skrev Noble Paul <[email protected]>:
>>>>>
>>>>> Thanks everyone for participating in the discussion. I have gone
>>>>> through all your valuable inputs and these are my suggestions
>>>>>
>>>>> Requirements?
>>>>>
>>>>>    1. Users should be able to designate a node with some role by
>>>>>    starting (say -Dnode.roles=coordinator)
>>>>>    2. This node should be able to perform a certain behavior
>>>>>    3. Replica placement should be aware of this and may choose to
>>>>>    place or not place a replica in this node
>>>>>    4. Any client should be able to query any node in the cluster to
>>>>>    get a list of nodes with a specified role or get the roles of a given 
>>>>> node
>>>>>
>>>>>
>>>>> Implementation?
>>>>> Here is how we could implement each of the requirements:
>>>>>
>>>>>    1. We could theoretically use a well known system property and
>>>>>    2. The actual behavior will have to be implemented in both 8.x or
>>>>>    9.x
>>>>>    3. Placement of replicas
>>>>>    1. It’s not possible to do this in 8.x
>>>>>       2. In 9.x, replica placement plugin can be internally used to
>>>>>       ensure proper placement of replicas in the roles feature.
>>>>>
>>>>>       1. It can’t be done with the current design as users cannot
>>>>>          chain multiple placement plugins or user has to build a custom 
>>>>> placement
>>>>>          plugin of his own
>>>>>          2. There is no standard UX to achieve this. It will be a
>>>>>          recipe (start nodes with this property and create these rules 
>>>>> etc, etc).
>>>>>          This is awkward & error prone, as compared to saying “start a 
>>>>> node with
>>>>>          coordinator role” and Solr will take care of it.
>>>>>          4. There will be a new API endpoint to publish this
>>>>>    information in 8.x and 9.x. This end point is important to make this
>>>>>    feature usable
>>>>>
>>>>>
>>>>> Conclusion
>>>>>
>>>>>    1. With a roles feature, we can achieve the objectives in a user
>>>>>    friendly and intuitive way
>>>>>    2. The user interface can be consistent across 8.x and 9.x even
>>>>>    though 9.x can use the placement plugin internally
>>>>>    3. The actual roles definition will be same across 8.x and 9.x
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 4, 2021 at 6:32 AM Noble Paul <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Michael
>>>>>>
>>>>>> We explored all options to before arriving at this solution. Ishan
>>>>>> has already explained why Tim's suggestions have their shortcomings when 
>>>>>> it
>>>>>> comes to user experience.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 4, 2021, 3:51 AM Michael Gibney <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> >I actually didn't realize that an empty Solr node would forward the
>>>>>>> top-level
>>>>>>> >request onward instead of just being the query controller itself?
>>>>>>> That
>>>>>>> >actually seems like a bug vs. a feature, IMO any node that receives
>>>>>>> >the top-level query should just be the coordinator, what stops it?
>>>>>>>
>>>>>>> +1 to Tim's statement quoted above; unless I'm missing something,
>>>>>>> this feels like an issue that should be addressed regardless of this 
>>>>>>> SIP.
>>>>>>> (perhaps it would be addressed incidentally by this SIP? -- in any event
>>>>>>> the current situation seems to not make sense. As Tim points out, the
>>>>>>> relevant configs should in principle be accessible from ZK whether or 
>>>>>>> not
>>>>>>> there's a core for a given collection on a given node).
>>>>>>>
>>>>>>> Considering the above, and especially given Ishan that you say "The
>>>>>>> coordinator role is the biggest motivation for introducing the concept 
>>>>>>> of
>>>>>>> roles", while reading the SIP I found myself wishing for a fuller
>>>>>>> enumeration of use cases, and a more sympathetic characterization of
>>>>>>> alternatives (existing alternatives, and perhaps, as with the above 
>>>>>>> "proxy
>>>>>>> request" issue, simpler-but-not-yet-implemented alternatives).
>>>>>>>
>>>>>>> Combining questions about use cases with questions about
>>>>>>> alternatives: assuming that 9.x autoscaling can indeed be reliably used 
>>>>>>> to
>>>>>>> stop replicas from being placed on nodes, how close would addressing the
>>>>>>> orthogonal "proxy request" issue come to addressing potential use cases?
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think if we have the new "pseudo core" abstraction (I like it!
>>>>>>>> Will it really be a core with an index on disk or some new abstraction 
>>>>>>>> only
>>>>>>>> tracked in ZK and in memory?) to play the role of coordinator, then we 
>>>>>>>> have
>>>>>>>> all we need with the affinity placement plugin framework for a data 
>>>>>>>> free
>>>>>>>> coordinator node implementation.
>>>>>>>> It is easy to use system properties to exclude nodes from
>>>>>>>> receiving replicas using the placement plugins, a minor change in the
>>>>>>>> Affinity Placement Plugin. Such nodes will not receive any replicas by 
>>>>>>>> the
>>>>>>>> placement plugin not even at startup (the system property will be 
>>>>>>>> assigned
>>>>>>>> at startup so no manual intervention needed).
>>>>>>>>
>>>>>>>> It will not work if switching to another placement plugin, unless
>>>>>>>> that other plugin reimplements that (simple) aspect. Is that an issue?
>>>>>>>>
>>>>>>>> Ilan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Answers inline below.
>>>>>>>>>
>>>>>>>>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> One last thought on this for me ... I think it would be
>>>>>>>>>> beneficial for
>>>>>>>>>> the SIP to address how this new feature will work with the
>>>>>>>>>> existing
>>>>>>>>>> shards.preference solution and affinity based placement plugin.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I was more inclined to keep this SIP focused on broad concept of
>>>>>>>>> roles, and any upcoming roles (coordinator role, along with that
>>>>>>>>> pseudo-core functionality) to be described in their own issue (e.g.
>>>>>>>>> SOLR-15715).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Moreover, your pseudo-replica solution sounds like a new replica
>>>>>>>>>> type
>>>>>>>>>> vs. a node level thing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I misspoke when I called it "pseudo replica", it is actually a
>>>>>>>>> "pseudo core". Replicas are shard level concepts, but such a pseudo 
>>>>>>>>> core
>>>>>>>>> that we plan to introduce will pertain to one or more collections. 
>>>>>>>>> Imagine
>>>>>>>>> collection1 has shard1 and shard2, there will be a single pseudo core 
>>>>>>>>> for
>>>>>>>>> collection1 (we haven't decided on the prefix of this pseudo core 
>>>>>>>>> yet, but
>>>>>>>>> a candidate can be ".collection1_coordinator"). Replica type won't 
>>>>>>>>> fit this
>>>>>>>>> mental model here. We can discuss this more in the SOLR-15715 issue.
>>>>>>>>>
>>>>>>>>> The placement strategy can place replicas
>>>>>>>>>> based on replica type and node type (just a system property), so
>>>>>>>>>> please address why you can't achieve a query coordinator behavior
>>>>>>>>>> with
>>>>>>>>>> a new replica type + improvements to the Affinity placement
>>>>>>>>>> plugin?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To put down my thoughts on why Affinity placement plugin won't
>>>>>>>>> work for the purpose of ensuring that we have nodes that host no data 
>>>>>>>>> on it:
>>>>>>>>> 1. We want the ability to have nodes with no data on it as a first
>>>>>>>>> class concept for users. Hence, if the Affinity placement plugin is 
>>>>>>>>> used
>>>>>>>>> for that purpose, users won't be able to switch out that plugin and 
>>>>>>>>> use
>>>>>>>>> anything of their own. Currently, IIUC, there's not way for users to 
>>>>>>>>> use
>>>>>>>>> multiple placement plugins.
>>>>>>>>> 2. Nodes that shouldn't host any replica on it are generally
>>>>>>>>> ephemeral in nature; many of them may join the cluster, they may go 
>>>>>>>>> away.
>>>>>>>>> If such a node joins the cluster, they immediately become eligible for
>>>>>>>>> replica placement, before even the sysadmin is able to assign an 
>>>>>>>>> affinity
>>>>>>>>> placement configuration for that node. This is a problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for your thoughts and feedback, I think it will help us put
>>>>>>>>> together the document with more insights into our design choices.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ishan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Also, in a cluster where new collections/shards/replicas are
>>>>>>>>>> continuously added all the time, it would be pretty awkward to start 
>>>>>>>>>> a node
>>>>>>>>>> (in regular mode), briefly have it become eligible for replica 
>>>>>>>>>> assignment,
>>>>>>>>>> then invoking a replica placement rule/autoscaling policy for that 
>>>>>>>>>> node to
>>>>>>>>>> not place replicas on it. Instead, starting a node with a defined 
>>>>>>>>>> role (as
>>>>>>>>>> a startup param) precludes that brief period of eligibility for 
>>>>>>>>>> replica
>>>>>>>>>> placement on such a node.
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> If we were to tell users how to do "scatter gather on an empty
>>>>>>>>>> node", *how exactly* would you recommend users have an empty node to 
>>>>>>>>>> begin
>>>>>>>>>> with? Wouldn't you say something like "for 8x you can do this (rule 
>>>>>>>>>> based
>>>>>>>>>> replica placement) or do that (autoscaling), but for 9x you do this 
>>>>>>>>>> new
>>>>>>>>>> thing". Having a node that doesn't have a data role seems like a 
>>>>>>>>>> consistent
>>>>>>>>>> and an elegant way for users to invoke such a functionality and also 
>>>>>>>>>> easily
>>>>>>>>>> relate to a broad concept, without having to deal with autoscaling
>>>>>>>>>> frameworks of the ancient past, medieval past or the future.
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> As opposed to what? Looking up the configset for the addressed
>>>>>>>>>> >>> collection and pulling whatever information it needs from
>>>>>>>>>> cached data.
>>>>>>>>>> >>> I'm sure there are some nuances but I hardly think you need a
>>>>>>>>>> node
>>>>>>>>>> >>> role framework to deal with determine the unique key field to
>>>>>>>>>> do
>>>>>>>>>> >>> scatter gather on an empty node when you have easy access to
>>>>>>>>>> >>> collection metadata.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Doesn't seem like a hard thing to overcome to me.
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >
>>>>>>>>>> >>> >
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> I'm not missing the point of the query coordinator, but I
>>>>>>>>>> actually
>>>>>>>>>> >>> >> didn't realize that an empty Solr node would forward the
>>>>>>>>>> top-level
>>>>>>>>>> >>> >> request onward instead of just being the query controller
>>>>>>>>>> itself? That
>>>>>>>>>> >>> >> actually seems like a bug vs. a feature, IMO any node that
>>>>>>>>>> receives
>>>>>>>>>> >>> >> the top-level query should just be the coordinator, what
>>>>>>>>>> stops it?
>>>>>>>>>> >>> >
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > To process a request there should be a core that uses the
>>>>>>>>>> same configset as the requested collection.
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> Anyway, it sounds to me like you guys have your minds made
>>>>>>>>>> up
>>>>>>>>>> >>> >> regardless of feedback.
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in your
>>>>>>>>>> SIP as a
>>>>>>>>>> >>> >> specific role, not sure why you took that as me wanting to
>>>>>>>>>> discuss the
>>>>>>>>>> >>> >> embedded ZK in your SIP?
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya
>>>>>>>>>> >>> >> <[email protected]> wrote:
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > Hi Tim,
>>>>>>>>>> >>> >> > Here are my responses inline.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> I'm just not convinced this feature is even needed and
>>>>>>>>>> the SIP is not
>>>>>>>>>> >>> >> >> convincing that "There is no proper alternative today."
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > There are no proper alternatives today, just hacks. On
>>>>>>>>>> 8x, we have two different deprecated frameworks to stop nodes from 
>>>>>>>>>> being
>>>>>>>>>> placed on a node (1. rule based replica placement, 2. autoscaling
>>>>>>>>>> framework). On 9x, we have a new autoscaling framework, which I 
>>>>>>>>>> don't even
>>>>>>>>>> think is fully implemented. And, there's definitely no way to have a 
>>>>>>>>>> node
>>>>>>>>>> act as a query coordinator without having data on it.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node
>>>>>>>>>> roles, doesn't
>>>>>>>>>> >>> >> >> mean Solr needs this.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > Solr needs this. Elastic has such concepts is a
>>>>>>>>>> coincidence, and also means we have an opportunity to catch up with 
>>>>>>>>>> them;
>>>>>>>>>> they have these concepts for a reason.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> Also, some of Elastic's roles overlap with
>>>>>>>>>> >>> >> >> concepts Solr already has in a different form, i.e
>>>>>>>>>> data_hot sounds
>>>>>>>>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull
>>>>>>>>>> Replica Type
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > I think that is beyond the scope of this SIP.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> 2) You can achieve the "coordinator" role with
>>>>>>>>>> auto-scaling rules
>>>>>>>>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it
>>>>>>>>>> even has a node
>>>>>>>>>> >>> >> >> type built in:
>>>>>>>>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP).
>>>>>>>>>> >>> >> >> Simply build your replica placement rules such that no
>>>>>>>>>> replicas land
>>>>>>>>>> >>> >> >> on "coordinator" nodes. And you can route queries using
>>>>>>>>>> node.sysprop
>>>>>>>>>> >>> >> >> already using shards.preference.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > I think you missed the whole point of the query
>>>>>>>>>> coordinator. Please refer to this
>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>>> >>> >> > Let me summarize the main difference between what (I
>>>>>>>>>> think) you refer to and what is proposed in SOLR-15715.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > With your suggestion, we'll have a node that doesn't
>>>>>>>>>> host any replicas. And you suggest queries landing on such nodes be 
>>>>>>>>>> routed
>>>>>>>>>> using shards.preference? Well, in such a case, these queries will be
>>>>>>>>>> forwarded/proxied to a random node hosting a replica of the 
>>>>>>>>>> collection and
>>>>>>>>>> that node then acts as the coordinator. This situation is no better 
>>>>>>>>>> than
>>>>>>>>>> sending the query directly to that particular node.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation
>>>>>>>>>> functionality. There will be pseudo replicas (aware of the 
>>>>>>>>>> configset) on
>>>>>>>>>> this coordinator node that handle the request themselves, sends shard
>>>>>>>>>> requests to data hosting replicas, collects responses and merges 
>>>>>>>>>> them, and
>>>>>>>>>> sends back to the user. This merge step is usually extremely memory
>>>>>>>>>> intensive, and it would be good to serve these off stateless nodes 
>>>>>>>>>> (that
>>>>>>>>>> host no data).
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing
>>>>>>>>>> the overseer?!?
>>>>>>>>>> >>> >> >> Also, we already have the ability to run the overseer
>>>>>>>>>> on specific
>>>>>>>>>> >>> >> >> nodes w/o a new framework, so this doesn't really
>>>>>>>>>> convince me we need
>>>>>>>>>> >>> >> >> a new framework.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > There's absolutely no change proposed to the "overseer"
>>>>>>>>>> role. What users need on production clusters are nodes dedicated for
>>>>>>>>>> overseer operations, and for that the current "overseer" role 
>>>>>>>>>> suffices,
>>>>>>>>>> together with some functionality to not place replicas on such nodes.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> 4) We will indeed need to decide which nodes host
>>>>>>>>>> embedded Zookeeper's
>>>>>>>>>> >>> >> >> but I'd argue that solution hasn't been designed
>>>>>>>>>> entirely and we
>>>>>>>>>> >>> >> >> probably don't need a formal node role framework to
>>>>>>>>>> determine which
>>>>>>>>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems
>>>>>>>>>> more like a small
>>>>>>>>>> >>> >> >> cluster thing and anyone running a large cluster will
>>>>>>>>>> probably have a
>>>>>>>>>> >>> >> >> dedicated ZK ensemble as they do today. The node role
>>>>>>>>>> thing seems like
>>>>>>>>>> >>> >> >> it's intended for large clusters and my gut says few
>>>>>>>>>> will use embedded
>>>>>>>>>> >>> >> >> ZK for large clusters.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > This SIP is not the right place for this discussion.
>>>>>>>>>> There's a separate SIP for this.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> 5) You can also achieve a lot of "node role"
>>>>>>>>>> functionality in query
>>>>>>>>>> >>> >> >> routing using the shards.preference parameter.
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > That doesn't solve the purpose behind
>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> At the very least, the SIP needs to list specific use
>>>>>>>>>> cases that
>>>>>>>>>> >>> >> >> require this feature that are not achievable with the
>>>>>>>>>> current features
>>>>>>>>>> >>> >> >> before getting bogged down in the impl. details.
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> > The coordinator role is the biggest motivation for
>>>>>>>>>> introducing the concept of roles. However, in addition to what is 
>>>>>>>>>> proposed
>>>>>>>>>> in SOLR-15715, a coordinator node can later on also be used as a 
>>>>>>>>>> node for
>>>>>>>>>> users to run streaming expressions on, do bulk indexing on (impl 
>>>>>>>>>> details
>>>>>>>>>> for this to come later, don't want distraction here).
>>>>>>>>>> >>> >> >
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> Tim
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >> >> >
>>>>>>>>>> >>> >> >> > I think there are things not yet accounted for. Time
>>>>>>>>>> I spent yesterday is biting me today. Pls give a couple days.
>>>>>>>>>> >>> >> >> >
>>>>>>>>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Hey Ishan,
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> I appreciate you writing up the SIP!  Here's some
>>>>>>>>>> notes/questions I
>>>>>>>>>> >>> >> >> >> had as I was reading through your writeup and this
>>>>>>>>>> mail thread.
>>>>>>>>>> >>> >> >> >> ("----" separators between thoughts, hopefully that
>>>>>>>>>> helps.)
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and Houston
>>>>>>>>>> already
>>>>>>>>>> >>> >> >> >> suggested: roles should default to "all-on".  I see
>>>>>>>>>> the downsides
>>>>>>>>>> >>> >> >> >> you're worried about with that approach (esp. around
>>>>>>>>>> 'overseer'), but
>>>>>>>>>> >>> >> >> >> they may be mitigatable, at least in part.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be a
>>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>>> restart
>>>>>>>>>> all the data nodes with -Dnode.roles=data
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Sure, if roles can only be specified at startup.
>>>>>>>>>> But that may be a
>>>>>>>>>> >>> >> >> >> self-imposed constraint.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> An API to change a node's roles would remove the
>>>>>>>>>> need for a restart
>>>>>>>>>> >>> >> >> >> and make it easy for users to affect the semantics
>>>>>>>>>> they want.  You
>>>>>>>>>> >>> >> >> >> decided you want a dedicated overseer N nodes into
>>>>>>>>>> your cluster
>>>>>>>>>> >>> >> >> >> deployment?  Deploy node 'N' with the 'overseer',
>>>>>>>>>> and toggle the
>>>>>>>>>> >>> >> >> >> overseer role off on the remainder.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Now, I understand that you don't want roles to
>>>>>>>>>> change at runtime, but
>>>>>>>>>> >>> >> >> >> I haven't seen you get much into "why", beyond
>>>>>>>>>> saying "it is very
>>>>>>>>>> >>> >> >> >> risky to have nodes change roles while they are up
>>>>>>>>>> and running."  Can
>>>>>>>>>> >>> >> >> >> you expand a bit on the risks you're worried about?
>>>>>>>>>> If you're
>>>>>>>>>> >>> >> >> >> explicit about them here maybe someone can think of
>>>>>>>>>> a clever way to
>>>>>>>>>> >>> >> >> >> address them?
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all
>>>>>>>>>> roles", then just by virtue of upgrading to this new version, new
>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or 
>>>>>>>>>> not the
>>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of
>>>>>>>>>> functionality than
>>>>>>>>>> >>> >> >> >> usual, so in a sense defaulting roles on is
>>>>>>>>>> scarier.  But in a sense
>>>>>>>>>> >>> >> >> >> you're describing something that's an inherent part
>>>>>>>>>> of software
>>>>>>>>>> >>> >> >> >> releases.  Releases expose new features that are
>>>>>>>>>> typically on by
>>>>>>>>>> >>> >> >> >> default.  A new default-on role in 9.1 might hurt a
>>>>>>>>>> user, but there's
>>>>>>>>>> >>> >> >> >> no fundamental difference between that and a change
>>>>>>>>>> to backups or
>>>>>>>>>> >>> >> >> >> replication or whatever in the same release.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> I don't mean to belittle the difference in scope - I
>>>>>>>>>> get your concern.
>>>>>>>>>> >>> >> >> >> But IMO this is something to address with good
>>>>>>>>>> release notes and
>>>>>>>>>> >>> >> >> >> documentation.  Designing for admins who don't do
>>>>>>>>>> even cursory
>>>>>>>>>> >>> >> >> >> research before an upgrade ties both our hands
>>>>>>>>>> behind our back as a
>>>>>>>>>> >>> >> >> >> project.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> > [SIP] Internal representation in ZK ...
>>>>>>>>>> Implementation details like these can be fleshed out in the PR
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> IMO this is important enough to flush out as part of
>>>>>>>>>> the SIP, at least
>>>>>>>>>> >>> >> >> >> in broad strokes.  It affects backcompat, SolrJ
>>>>>>>>>> client design, etc.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Woohoo - way to include a v2 API definition!
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I
>>>>>>>>>> wonder whether "GET
>>>>>>>>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive
>>>>>>>>>> endpoint for the
>>>>>>>>>> >>> >> >> >> "get the roles this node has" functionality.  Though
>>>>>>>>>> I leave that for
>>>>>>>>>> >>> >> >> >> your consideration.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Looking forward to your responses and seeing the SIP
>>>>>>>>>> progress!  It's a
>>>>>>>>>> >>> >> >> >> really cool, promising idea IMO.
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Best,
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> Jason
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
>>>>>>>>>> >>> >> >> >> <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >
>>>>>>>>>> >>> >> >> >> > Are there any unaddressed outstanding concerns
>>>>>>>>>> that we should hold up the SIP for?
>>>>>>>>>> >>> >> >> >> >
>>>>>>>>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan
>>>>>>>>>> Chattopadhyaya, <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where
>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>>> Separate roles for
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell
>>>>>>>>>> correction" etc.? But anyway, that is for discussion when we add 
>>>>>>>>>> future
>>>>>>>>>> roles. This is beyond this SIP.
>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>> >>> >> >> >> >> > I am not asking you to implement every possible
>>>>>>>>>> role of course :). As a note I know a company that is running an 
>>>>>>>>>> entire
>>>>>>>>>> separate
>>>>>>>>>> >>> >> >> >> >> > cluster to offload and better serve
>>>>>>>>>> highlighting on a subset of large docs, so YES I think there are 
>>>>>>>>>> people who
>>>>>>>>>> may want such fine grained control.
>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>> >>> >> >> >> >> Cool, I think we can discuss adding any
>>>>>>>>>> additional roles (for highlighting?) on a case by case basis at a 
>>>>>>>>>> later
>>>>>>>>>> point.
>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan
>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is that
>>>>>>>>>> roles required for back compatibility get explicitly added on 
>>>>>>>>>> startup, if
>>>>>>>>>> not by the user then by the code. This is more flexible than 
>>>>>>>>>> assuming that
>>>>>>>>>> no role means every role, because then every new feature that has a 
>>>>>>>>>> role
>>>>>>>>>> will end up on legacy clusters which are also not back compatible.
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I
>>>>>>>>>> said: "This is why I was advocating that 1) we assume the "data" as a
>>>>>>>>>> default, 2) not assume overseer to be implicitly defined (because of 
>>>>>>>>>> the
>>>>>>>>>> way overseer role is written today), 3) not assume any future roles 
>>>>>>>>>> to be
>>>>>>>>>> true by default."
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles
>>>>>>>>>> required for back compatibility" (that should be explicitly added on
>>>>>>>>>> startup) be just the ["data"] role, and not the "overseer" role (due 
>>>>>>>>>> to the
>>>>>>>>>> way overseer role is currently defined, i.e. it is "preferred 
>>>>>>>>>> overseer").
>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended,
>>>>>>>>>> Frustrated yes offended no :)... the most difficult thing about
>>>>>>>>>> communication is the illusion it has occurred :)
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>> If you read back just a few emails you'll see
>>>>>>>>>> where I talk about roles being applied on startup. Boiling it down 
>>>>>>>>>> the idea
>>>>>>>>>> I'm proposing is that roles required for back compatibility get 
>>>>>>>>>> explicitly
>>>>>>>>>> added on startup, if not by the user then by the code. This is more
>>>>>>>>>> flexible than assuming that no role means every role, because then 
>>>>>>>>>> every
>>>>>>>>>> new feature that has a role will end up on legacy clusters which are 
>>>>>>>>>> also
>>>>>>>>>> not back compatible.
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>> There are points where I said all roles rather
>>>>>>>>>> than back compatibility roles because I was thinking about back
>>>>>>>>>> compatibility specifically, but you can't know that if I don't say 
>>>>>>>>>> that can
>>>>>>>>>> you :).
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan
>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>> >>> >> >> >> >>>>> > If you read more closely, my way can provide
>>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. 
>>>>>>>>>> Perhaps
>>>>>>>>>> you need to re-read?
>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm
>>>>>>>>>> trying my best. Please don't be offended, and kindly point me to the 
>>>>>>>>>> exact
>>>>>>>>>> part you want me to re-read.
>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan
>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Positive - They denote the existence of
>>>>>>>>>> a capability
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >   Absolute - Absence/Presence binary
>>>>>>>>>> identification of a capability; no implications, no assumptions
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on
>>>>>>>>>> nodes running without any roles. There has to be an implicit 
>>>>>>>>>> assumption as
>>>>>>>>>> to what roles are those nodes assumed to have. My proposal is that 
>>>>>>>>>> only the
>>>>>>>>>> "data" role be assumed, but not the "overseer" role. For any future 
>>>>>>>>>> roles
>>>>>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what absence 
>>>>>>>>>> of any
>>>>>>>>>> role implies should be left to the implementation of that future 
>>>>>>>>>> role.
>>>>>>>>>> Documentation should reflect clearly about these implicit 
>>>>>>>>>> assumptions.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide
>>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. 
>>>>>>>>>> Perhaps
>>>>>>>>>> you need to re-read?
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Focused - Do one thing per role
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where
>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>>> Separate roles for nodes that do "faceting" or "spell correction" 
>>>>>>>>>> etc.? But
>>>>>>>>>> anyway, that is for discussion when we add future roles. This is 
>>>>>>>>>> beyond
>>>>>>>>>> this SIP.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> I am not asking you to implement every
>>>>>>>>>> possible role of course :). As a note I know a company that is 
>>>>>>>>>> running an
>>>>>>>>>> entire separate cluster to offload and better serve highlighting on a
>>>>>>>>>> subset of large docs, so YES I think there are people who may want 
>>>>>>>>>> such
>>>>>>>>>> fine grained control.
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Accessible - It should be dead simple
>>>>>>>>>> to determine the members of a role, avoid parsing blobs of json, 
>>>>>>>>>> avoid
>>>>>>>>>> calculating implications, avoid consulting other resources after 
>>>>>>>>>> listing
>>>>>>>>>> nodes with the role
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation
>>>>>>>>>> details that make it easy. There should be a reasonable API to 
>>>>>>>>>> return these
>>>>>>>>>> node roles, with ability to filter by role or filter by node.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Independent - One role should not
>>>>>>>>>> require other roles to be present
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast
>>>>>>>>>> requirement upfront? There might be situations where this is 
>>>>>>>>>> desirable. I
>>>>>>>>>> feel we can discuss on a case by case basis whenever a future role 
>>>>>>>>>> is added.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Persistent - roles should not be lost
>>>>>>>>>> across reboot
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Agree.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Immutable - roles should not change
>>>>>>>>>> while the node is running
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> Agree
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> >    Lively - A node with a capability may
>>>>>>>>>> not be presently providing that capability.
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> I don't understand, can you please elaborate?
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> Specifically imagine the case where there are
>>>>>>>>>> 100 nodes:
>>>>>>>>>> >>> >> >> >> >>>>>> 1-100 ==> DATA
>>>>>>>>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER
>>>>>>>>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll want
>>>>>>>>>> only one of those to be providing overseer functionality and the 
>>>>>>>>>> other two
>>>>>>>>>> to be capable, but not providing (so that if the current overseer 
>>>>>>>>>> goes down
>>>>>>>>>> a new one can be assigned).
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You
>>>>>>>>>> start nodes 107-108 with that role, but you probably want to ensure 
>>>>>>>>>> that
>>>>>>>>>> zookeepers require some sort of command for them to actually join the
>>>>>>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) 
>>>>>>>>>> ... to do
>>>>>>>>>> that the nodes need to be up. But oh look I typoed 108... we want 
>>>>>>>>>> that to
>>>>>>>>>> fail... how? because 18 does not have the capability to become a 
>>>>>>>>>> zookeeper.
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan
>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles
>>>>>>>>>> defined should be assumed to have all roles. Not only data. I don't 
>>>>>>>>>> see a
>>>>>>>>>> reason to special case this one or any role.
>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions"
>>>>>>>>>> Nothing to figure out. A node has a role or not. For back 
>>>>>>>>>> compatibility
>>>>>>>>>> reasons, all roles would be assumed on startup if none specified.
>>>>>>>>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list
>>>>>>>>>> of roles = exactly those roles.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do
>>>>>>>>>> with backcompat.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat:
>>>>>>>>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how
>>>>>>>>>> overseer works and adopt this approach (as quoted), then imagine this
>>>>>>>>>> situation:
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be
>>>>>>>>>> "data,overseer").
>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention:
>>>>>>>>>> dedicated overseer)
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a
>>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>>> restart
>>>>>>>>>> all the data nodes with -Dnode.roles=data. This will cause 
>>>>>>>>>> unnecessary
>>>>>>>>>> disruption to running clusters where a dedicated overseer is needed. 
>>>>>>>>>> Keep
>>>>>>>>>> in mind, if a user needs a dedicated overseer, he's likely in an 
>>>>>>>>>> emergency
>>>>>>>>>> situation and restarting the whole cluster might not be viable for 
>>>>>>>>>> him/her.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible
>>>>>>>>>> with this "assumed to have all roles" idea:
>>>>>>>>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for
>>>>>>>>>> example. Today, regular nodes are not supposed to have embedded ZK 
>>>>>>>>>> running
>>>>>>>>>> on them. By introducing this artificial limitation ("assumed to have 
>>>>>>>>>> all
>>>>>>>>>> roles"), we constrain adoption of all future roles to necessarily 
>>>>>>>>>> require a
>>>>>>>>>> full cluster restart.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can
>>>>>>>>>> introduce new capabilities and roles. Imagine we have a role that is
>>>>>>>>>> defined in a new Solr version (and there's functionality to go with 
>>>>>>>>>> that
>>>>>>>>>> role), and user upgrades to that version. However, his/her nodes all 
>>>>>>>>>> were
>>>>>>>>>> started with no node.roles param. Hence, if those nodes are "assumed 
>>>>>>>>>> to
>>>>>>>>>> have all roles", then just by virtue of upgrading to this new 
>>>>>>>>>> version, new
>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or 
>>>>>>>>>> not the
>>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator
>>>>>>>>>> to do more work, I would prefer small focused roles with names that
>>>>>>>>>> accurately describe their function. In that light, COORDINATOR might 
>>>>>>>>>> be too
>>>>>>>>>> nebulous. How about AGREGATOR role? (what I was thinking of would 
>>>>>>>>>> better be
>>>>>>>>>> called a QUERY_ANALYSIS role)
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> If you want to do specific things like
>>>>>>>>>> query analysis or query aggregation or bulk indexing etc, all of 
>>>>>>>>>> those can
>>>>>>>>>> be done on COORDINATOR nodes (as is the case in ElasticSearch). 
>>>>>>>>>> Having tens
>>>>>>>>>> of of " small focused roles" defined as first class concepts would be
>>>>>>>>>> confusing to the user. As a remedy to your situation where you want 
>>>>>>>>>> the
>>>>>>>>>> coordinator role to also do query-analysis for shards, one possible
>>>>>>>>>> solution is to send such a query to a coordinator node with a 
>>>>>>>>>> parameter
>>>>>>>>>> like "coordinator.query_analysis=true", and then the coordinator, 
>>>>>>>>>> instead
>>>>>>>>>> of blindly hitting remote shards, also does some extra work on 
>>>>>>>>>> behalf of
>>>>>>>>>> the shards.
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan
>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for
>>>>>>>>>> example (replicas of that collection can only be
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, in
>>>>>>>>>> addition to the other role based constraints),
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user
>>>>>>>>>> extensible and not fixed.
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the
>>>>>>>>>> constraints introduced by roles apply to all collections
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if a
>>>>>>>>>> user needs for example a heavily used collection to
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware
>>>>>>>>>> collections are orthogonal topics. What you describe above can be 
>>>>>>>>>> achieved
>>>>>>>>>> by the autoscaling+replica placement framework where the placement 
>>>>>>>>>> plugins
>>>>>>>>>> take the node roles as one of the inputs.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early on:
>>>>>>>>>> the set of roles need to be expandable by a user
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles
>>>>>>>>>> for example (consumed by placement plugins) and be
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new (arbitrary)
>>>>>>>>>> roles. Should such roles follow some naming syntax to
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles?
>>>>>>>>>> To be able to fail on typos on roles - that otherwise can be
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This
>>>>>>>>>> implies in any case that the current design can't assume all
>>>>>>>>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or
>>>>>>>>>> define them in a Java enum.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by
>>>>>>>>>> something different from roles. Something like node labels (user 
>>>>>>>>>> defined)
>>>>>>>>>> which can then be used in a replica placement plugin to assign 
>>>>>>>>>> replicas. I
>>>>>>>>>> see roles as more closely associated with kinds of functionality a 
>>>>>>>>>> node is
>>>>>>>>>> designated for. Therefore, I feel that replica placements and user 
>>>>>>>>>> defined
>>>>>>>>>> node labels is out of scope for this SIP. It can be added later in a
>>>>>>>>>> separate SIP, without being at odds with this proposal.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan
>>>>>>>>>> Ginzburg <[email protected]>:
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined
>>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a 
>>>>>>>>>> reason to
>>>>>>>>>> special case this one or any role.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No
>>>>>>>>>> role == all roles. Explicit list of roles = exactly those roles.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe
>>>>>>>>>> preference is something handled as a feature of the role rather than 
>>>>>>>>>> via
>>>>>>>>>> role designation?
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that
>>>>>>>>>> feature can decide to use its list of nodes as a preference if it so
>>>>>>>>>> chooses.
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we
>>>>>>>>>> always prefix Solr env.vars and sys.props with "SOLR_" or "solr.", 
>>>>>>>>>> i.e.
>>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to have
>>>>>>>>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every 
>>>>>>>>>> single
>>>>>>>>>> property. Instead we can parse all ENVs and Props with the solr 
>>>>>>>>>> prefix in
>>>>>>>>>> our bootstrap code. And we can by convention allow e.g. docker run -e
>>>>>>>>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting...
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Jan
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>> [email protected]
>>>>>>>>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>> [email protected]
>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>> >>> >> >> >> >>>>>> --
>>>>>>>>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play)
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>> >>> >> >> >> >>>> --
>>>>>>>>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work)
>>>>>>>>>> >>> >> >> >> >>>> http://www.the111shift.com (play)
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >>> >> >> >> To unsubscribe, e-mail:
>>>>>>>>>> [email protected]
>>>>>>>>>> >>> >> >> >> For additional commands, e-mail:
>>>>>>>>>> [email protected]
>>>>>>>>>> >>> >> >> >>
>>>>>>>>>> >>> >> >> >
>>>>>>>>>> >>> >> >> >
>>>>>>>>>> >>> >> >> > --
>>>>>>>>>> >>> >> >> > http://www.needhamsoftware.com (work)
>>>>>>>>>> >>> >> >> > http://www.the111shift.com (play)
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >> >>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >>> >> >> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> >>> >> >> For additional commands, e-mail:
>>>>>>>>>> [email protected]
>>>>>>>>>> >>> >> >>
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >>> >> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> >>> >> For additional commands, e-mail: [email protected]
>>>>>>>>>> >>> >>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> >>> For additional commands, e-mail: [email protected]
>>>>>>>>>> >>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Noble Paul
>>>>>
>>>>>
>>>>>

Re: First class support for node roles

Reply via email to