In other words, roles are all "positive", but their consequences are only negative (rejecting when the matching positive role is not present).
We can also consider no role defined = all roles allowed. Will make things simpler. On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]> wrote: > How do we expect the roles to be used? > One way I see is a node refusing to do anything related to a role it > doesn't have. > For example if a node does not have role "data", any attempt to create a > core on it would fail. > A node not having the role "query", will refuse to have anything to do > with handling a query etc. > Then it would be up to other code to make sure only the appropriate nodes > are requested to do any type of action. > So for example any replica placement code plugin would have to restrict > the set of candidate nodes for a new replica placement to those having > "data". Otherwise the call would fail, and there should be nothing the > replica placement code can do about it. > > Similarly, the "overseer" role would limit the nodes that participate in > the Overseer election. The Overseer election code would have to remove (or > not add) all non qualifying nodes from the election, and we should expect a > node without role "overseer" to refuse to start the Overseer machinery if > asked to... > > Trying to make the use case clear regarding how roles are used. > Ilan > > On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]> wrote: > >> >> >> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya < >> [email protected]> wrote: >> >>> Hi Gus, >>> >>> > I think that we should expand/edit your list of roles to be >>> >>> The list can be expanded as and when more isolation and features are >>> needed. I only listed those roles that we already have a functionality for >>> or is under development. >>> >> >> Well all of those roles (except zookeeper) are things nodes do today. As >> it stands they are all doing all of them. What we add support for as we >> move forward is starting without a role, and add the zookeeper role when >> that feature is ready. >> >> >>> > I would like to recommend that the roles be all positive ("Can do >>> this") and nodes with no role at all are ineligible for all activities. >>> >>> It comes down to the defaults and backcompat. If we want all Solr nodes >>> to be able to host data replicas by default (without user explicitly >>> specifying role=data), then we need a way to unset this role. The most >>> reasonable way sounded like a "!data". We can do away with !data if we >>> mandate each and every data node have the role "data" explicitly defined >>> for it, which breaks backcompat and also is cumbersome to use for those who >>> don't want to use these special roles. >>> >>> >> Not sure I understand, which of the roles I mentioned (other than >> zookeeper, which I expect is intended as different from our current >> embedded zk) is NOT currently supported by a single cloud node brought up >> as shown in our tutorials/docs? I'm certainly not proposing that the >> default change to nothing. The default is all roles, unless you specify >> roles at startup. >> >> >>> > I also suggest that these roles each have a node in zookeeper listing >>> the current member nodes (as child nodes) so that code that wants to find a >>> node with an appropriate role does not need to scan the list of all nodes >>> parsing something to discover which nodes apply and also does not have to >>> parse json to do it. >>> >>> /roles.json exists today, it has role as key and list of nodes as value. >>> In the next major version, we can change the format of that file and use >>> key as node, value as list of roles. Or, maybe we can go for adding the >>> roles to the data for each item in the list of live_nodes. >>> >>> >> I'm not finding anything in our documentation about roles.json so I think >> it's an internal implementation detail, which reduces back compat concerns. >> ADDROLE/REMOVEROLE don't accept json or anything like that and could be >> made to work with zk nodes too. >> >> The fact that some precursor work was done without a SIP (or before SIPs >> existed) should not hamstring our design once a SIP that clearly covers the >> same topic is under consideration. By their nature SIP's are non-trivial >> and often will include compatibility breaks. Good news is I don't think I >> see one here, just a code change to transition to a different zk backend. I >> think that it's probably a mistake to consider our zookeeper data a public >> API and we should be moving away from that or at the very least segregating >> clearly what in zk is long term reliable. Ideally our v1/v2 api's should be >> the public api through which information about the cluster is obtained. >> Programming directly against zk is kind of like a custom build of solr. >> Sometimes useful and appropriate, but maintenance is your concern. For code >> plugging into solr, it should in theory be against an internal information >> java api, and zookeeper should not be touched directly. (I know this is not >> in a good state or at least wasn't last time I looked closely, but it >> should be where we are heading). >> >> > any code seeking to transition a node >>> >>> We considered this situation and realized that it is very risky to have >>> nodes change roles while they are up and running. Better to assign fixed >>> roles upon startup. >>> >> >> I agree that concurrency is hard. I definitely think startup time >> assignments should be involved here. I'm not thinking that every transition >> must be supported. As a starting point it would be fine if none were. >> Having something suddenly become zookeeper is probably tricky to support >> (see discussion in that thread regarding nodes not actually participating >> until they have a partner to join with them to avoid even numbered >> clusters), but I think the design should not preclude the possibility of >> nodes becoming eligible for some roles or withdrawing from some roles, and >> treatment of roles should be consistent. In some cases someone may decide >> it's worth the work of handling the concurrency concerns, best if they >> don't have to break back compat or hack their code around the assumption it >> wouldn't happen to do it. >> >> Taking the zookeeper case as an example, it very much might be desirable >> to have the possibility to heal the zk cluster by promoting another node >> (configured as eligible for zk) to active zk duty if one of the current zk >> nodes has been down long enough (say on prem hardware, motherboard pops a >> capacitor, server gone for a week while new hardware is purchased, built >> and configured). Especially if the down node didn't hold data or other >> nodes had sufficient replicas and the cluster is still answering queries >> just fine. >> >> >>> >>> > I know of a case that would benefit from having separate Query/Update >>> nodes that handle a heavy analysis process which would be deployed to a >>> number of CPU heavy boxes (which might add more in prep for bulk indexing, >>> and remove them when bulk was done), data could then be hosted on cheaper >>> nodes.... >>> >>> This is the main motivation behind this work. SOLR-15715 needs this, and >>> hence it would be good to get this in as soon as possible. >>> >> >> I think we can incrementally work towards configurability for all of >> these roles. The current default state is that a node has all roles and the >> incremental progress is to enable removing a role from a node. This I think >> is why it might be good to to >> >> A) Determine the set of roles our current solr nodes are performing (that >> might be removed in some scenario) and document this via assigning these >> roles as default on as this SIP goes live. >> B) Figure out what the process of adding something entirely new that we >> haven't yet thought of with its own role would look like. >> >> I think it would be great if we not only satisfied the current need but >> determined how we expect this to change over time. >> >> >>> Regards, >>> Ishan >>> >>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]> wrote: >>> >>>> The SIP looks like a good start, and I was already thinking of >>>> something very similar to this as a follow on to my attempts to split the >>>> uber filter (SolrDispatchFilter) into servlets such that roles determine >>>> what servlets are deployed, but I would like to recommend that the roles be >>>> all positive ("Can do this") and nodes with no role at all are ineligible >>>> for all activities. (just like standard role permissioning systems). This >>>> will make it much more familiar and easy to think about. Therefore there >>>> would be no need for a role such as !data which I presume was meant to mean >>>> "no data on this node"... rather just don't give the "data" role to the >>>> node. >>>> >>>> Additional node roles I think should exist: >>>> >>>> I think that we should expand/edit your list of roles to be >>>> >>>> - QUERY - accepts and analyzes queries up to the point of actually >>>> consulting the lucene index (useful if you have a very heavy analysis >>>> phase) >>>> - UPDATE - accepts update requests, and performs update >>>> functionality prior to and including DistributedUpdateProcessorFactory >>>> (useful if you have a very heavy analysis phase) >>>> - ADMIN - accepts admin/management commands >>>> - UI - hosts an admin ui >>>> - ZOOKEEPER - hosts embedded zookeeper >>>> - OVERSEER - performs overseer related functionality (though IIRC >>>> there's a proposal to eliminate overseer that might eliminate this) >>>> - DATA - nodes where there is a lucene index and matching against >>>> the analyzed results of a query may be conducted to generate a response, >>>> also performs update steps that come after >>>> DistributedUpdateProcesserFactory >>>> >>>> I also suggest that these roles each have a node in zookeeper listing >>>> the current member nodes (as child nodes) so that code that wants to find a >>>> node with an appropriate role does not need to scan the list of all nodes >>>> parsing something to discover which nodes apply and also does not have to >>>> parse json to do it. I think this will be particularly key for zookeeper >>>> nodes which might be 3 out of 100 or more nodes. Similar to how we track >>>> live nodes. I think we should have a nodes.json too that tracks what roles >>>> a node is ALLOWED to take (as opposed to which roles it currently >>>> servicing) >>>> >>>> So running code consults the zookeeper role list of nodes, and any code >>>> seeking to transition a node (an admin operation with much lower >>>> performance requirements) consults the json data in the nodes.json node, >>>> parses it, finds the node in question and checks what it's eligible for >>>> (this will correspond to which servlets/apps have been loaded). >>>> >>>> I know of a case that would benefit from having separate Query/Update >>>> nodes that handle a heavy analysis process which would be deployed to a >>>> number of CPU heavy boxes (which might add more in prep for bulk indexing, >>>> and remove them when bulk was done), data could then be hosted on cheaper >>>> nodes.... >>>> >>>> Also maybe think about how this relates to NRT/TLOG/PULL which are also >>>> maybe role like >>>> >>>> WDYT? >>>> >>>> -Gus >>>> >>>> >>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Here's an SIP for introducing the concept of node roles: >>>>> https://issues.apache.org/jira/browse/SOLR-15694 >>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles >>>>> >>>>> We also wish to add first class support for Query nodes that are used >>>>> to process user queries by forwarding to data nodes, merging/aggregating >>>>> them and presenting to users. This concept exists as first class citizens >>>>> in most other search engines. This is a chance for Solr to catch up. >>>>> https://issues.apache.org/jira/browse/SOLR-15715 >>>>> >>>>> Regards, >>>>> Ishan / Noble / Hitesh >>>>> >>>> >>>> >>>> -- >>>> http://www.needhamsoftware.com (work) >>>> http://www.the111shift.com (play) >>>> >>> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> >
