> In other words, roles are all "positive", but their consequences are only > negative (rejecting when the matching positive role is not present). > > Yeah right. to do something the machine needs the role
> We can also consider no role defined = all roles allowed. Will make things > simpler. > in terms of startup command yes. Internally we should have all explicitly assigned when no roles are specified at startup so that the code doesn't have a million if checks for the empty case > > On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]> wrote: > >> How do we expect the roles to be used? >> One way I see is a node refusing to do anything related to a role it >> doesn't have. >> For example if a node does not have role "data", any attempt to create a >> core on it would fail. >> A node not having the role "query", will refuse to have anything to do >> with handling a query etc. >> Then it would be up to other code to make sure only the appropriate nodes >> are requested to do any type of action. >> So for example any replica placement code plugin would have to restrict >> the set of candidate nodes for a new replica placement to those having >> "data". Otherwise the call would fail, and there should be nothing the >> replica placement code can do about it. >> >> Similarly, the "overseer" role would limit the nodes that participate in >> the Overseer election. The Overseer election code would have to remove (or >> not add) all non qualifying nodes from the election, and we should expect a >> node without role "overseer" to refuse to start the Overseer machinery if >> asked to... >> >> Trying to make the use case clear regarding how roles are used. >> Ilan >> >> On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]> wrote: >> >>> >>> >>> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya < >>> [email protected]> wrote: >>> >>>> Hi Gus, >>>> >>>> > I think that we should expand/edit your list of roles to be >>>> >>>> The list can be expanded as and when more isolation and features are >>>> needed. I only listed those roles that we already have a functionality for >>>> or is under development. >>>> >>> >>> Well all of those roles (except zookeeper) are things nodes do today. As >>> it stands they are all doing all of them. What we add support for as we >>> move forward is starting without a role, and add the zookeeper role when >>> that feature is ready. >>> >>> >>>> > I would like to recommend that the roles be all positive ("Can do >>>> this") and nodes with no role at all are ineligible for all activities. >>>> >>>> It comes down to the defaults and backcompat. If we want all Solr nodes >>>> to be able to host data replicas by default (without user explicitly >>>> specifying role=data), then we need a way to unset this role. The most >>>> reasonable way sounded like a "!data". We can do away with !data if we >>>> mandate each and every data node have the role "data" explicitly defined >>>> for it, which breaks backcompat and also is cumbersome to use for those who >>>> don't want to use these special roles. >>>> >>>> >>> Not sure I understand, which of the roles I mentioned (other than >>> zookeeper, which I expect is intended as different from our current >>> embedded zk) is NOT currently supported by a single cloud node brought up >>> as shown in our tutorials/docs? I'm certainly not proposing that the >>> default change to nothing. The default is all roles, unless you specify >>> roles at startup. >>> >>> >>>> > I also suggest that these roles each have a node in zookeeper listing >>>> the current member nodes (as child nodes) so that code that wants to find a >>>> node with an appropriate role does not need to scan the list of all nodes >>>> parsing something to discover which nodes apply and also does not have to >>>> parse json to do it. >>>> >>>> /roles.json exists today, it has role as key and list of nodes as >>>> value. In the next major version, we can change the format of that file and >>>> use key as node, value as list of roles. Or, maybe we can go for adding the >>>> roles to the data for each item in the list of live_nodes. >>>> >>>> >>> I'm not finding anything in our documentation about roles.json so I >>> think it's an internal implementation detail, which reduces back compat >>> concerns. ADDROLE/REMOVEROLE don't accept json or anything like that and >>> could be made to work with zk nodes too. >>> >>> The fact that some precursor work was done without a SIP (or before SIPs >>> existed) should not hamstring our design once a SIP that clearly covers the >>> same topic is under consideration. By their nature SIP's are non-trivial >>> and often will include compatibility breaks. Good news is I don't think I >>> see one here, just a code change to transition to a different zk backend. I >>> think that it's probably a mistake to consider our zookeeper data a public >>> API and we should be moving away from that or at the very least segregating >>> clearly what in zk is long term reliable. Ideally our v1/v2 api's should be >>> the public api through which information about the cluster is obtained. >>> Programming directly against zk is kind of like a custom build of solr. >>> Sometimes useful and appropriate, but maintenance is your concern. For code >>> plugging into solr, it should in theory be against an internal information >>> java api, and zookeeper should not be touched directly. (I know this is not >>> in a good state or at least wasn't last time I looked closely, but it >>> should be where we are heading). >>> >>> > any code seeking to transition a node >>>> >>>> We considered this situation and realized that it is very risky to have >>>> nodes change roles while they are up and running. Better to assign fixed >>>> roles upon startup. >>>> >>> >>> I agree that concurrency is hard. I definitely think startup time >>> assignments should be involved here. I'm not thinking that every transition >>> must be supported. As a starting point it would be fine if none were. >>> Having something suddenly become zookeeper is probably tricky to support >>> (see discussion in that thread regarding nodes not actually participating >>> until they have a partner to join with them to avoid even numbered >>> clusters), but I think the design should not preclude the possibility of >>> nodes becoming eligible for some roles or withdrawing from some roles, and >>> treatment of roles should be consistent. In some cases someone may decide >>> it's worth the work of handling the concurrency concerns, best if they >>> don't have to break back compat or hack their code around the assumption it >>> wouldn't happen to do it. >>> >>> Taking the zookeeper case as an example, it very much might be desirable >>> to have the possibility to heal the zk cluster by promoting another node >>> (configured as eligible for zk) to active zk duty if one of the current zk >>> nodes has been down long enough (say on prem hardware, motherboard pops a >>> capacitor, server gone for a week while new hardware is purchased, built >>> and configured). Especially if the down node didn't hold data or other >>> nodes had sufficient replicas and the cluster is still answering queries >>> just fine. >>> >>> >>>> >>>> > I know of a case that would benefit from having separate Query/Update >>>> nodes that handle a heavy analysis process which would be deployed to a >>>> number of CPU heavy boxes (which might add more in prep for bulk indexing, >>>> and remove them when bulk was done), data could then be hosted on cheaper >>>> nodes.... >>>> >>>> This is the main motivation behind this work. SOLR-15715 needs this, >>>> and hence it would be good to get this in as soon as possible. >>>> >>> >>> I think we can incrementally work towards configurability for all of >>> these roles. The current default state is that a node has all roles and the >>> incremental progress is to enable removing a role from a node. This I think >>> is why it might be good to to >>> >>> A) Determine the set of roles our current solr nodes are performing >>> (that might be removed in some scenario) and document this via assigning >>> these roles as default on as this SIP goes live. >>> B) Figure out what the process of adding something entirely new that we >>> haven't yet thought of with its own role would look like. >>> >>> I think it would be great if we not only satisfied the current need but >>> determined how we expect this to change over time. >>> >>> >>>> Regards, >>>> Ishan >>>> >>>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]> wrote: >>>> >>>>> The SIP looks like a good start, and I was already thinking of >>>>> something very similar to this as a follow on to my attempts to split the >>>>> uber filter (SolrDispatchFilter) into servlets such that roles determine >>>>> what servlets are deployed, but I would like to recommend that the roles >>>>> be >>>>> all positive ("Can do this") and nodes with no role at all are ineligible >>>>> for all activities. (just like standard role permissioning systems). This >>>>> will make it much more familiar and easy to think about. Therefore there >>>>> would be no need for a role such as !data which I presume was meant to >>>>> mean >>>>> "no data on this node"... rather just don't give the "data" role to the >>>>> node. >>>>> >>>>> Additional node roles I think should exist: >>>>> >>>>> I think that we should expand/edit your list of roles to be >>>>> >>>>> - QUERY - accepts and analyzes queries up to the point of actually >>>>> consulting the lucene index (useful if you have a very heavy analysis >>>>> phase) >>>>> - UPDATE - accepts update requests, and performs update >>>>> functionality prior to and including DistributedUpdateProcessorFactory >>>>> (useful if you have a very heavy analysis phase) >>>>> - ADMIN - accepts admin/management commands >>>>> - UI - hosts an admin ui >>>>> - ZOOKEEPER - hosts embedded zookeeper >>>>> - OVERSEER - performs overseer related functionality (though IIRC >>>>> there's a proposal to eliminate overseer that might eliminate this) >>>>> - DATA - nodes where there is a lucene index and matching against >>>>> the analyzed results of a query may be conducted to generate a >>>>> response, >>>>> also performs update steps that come after >>>>> DistributedUpdateProcesserFactory >>>>> >>>>> I also suggest that these roles each have a node in zookeeper listing >>>>> the current member nodes (as child nodes) so that code that wants to find >>>>> a >>>>> node with an appropriate role does not need to scan the list of all nodes >>>>> parsing something to discover which nodes apply and also does not have to >>>>> parse json to do it. I think this will be particularly key for zookeeper >>>>> nodes which might be 3 out of 100 or more nodes. Similar to how we track >>>>> live nodes. I think we should have a nodes.json too that tracks what roles >>>>> a node is ALLOWED to take (as opposed to which roles it currently >>>>> servicing) >>>>> >>>>> So running code consults the zookeeper role list of nodes, and any >>>>> code seeking to transition a node (an admin operation with much lower >>>>> performance requirements) consults the json data in the nodes.json node, >>>>> parses it, finds the node in question and checks what it's eligible for >>>>> (this will correspond to which servlets/apps have been loaded). >>>>> >>>>> I know of a case that would benefit from having separate Query/Update >>>>> nodes that handle a heavy analysis process which would be deployed to a >>>>> number of CPU heavy boxes (which might add more in prep for bulk indexing, >>>>> and remove them when bulk was done), data could then be hosted on cheaper >>>>> nodes.... >>>>> >>>>> Also maybe think about how this relates to NRT/TLOG/PULL which are >>>>> also maybe role like >>>>> >>>>> WDYT? >>>>> >>>>> -Gus >>>>> >>>>> >>>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Here's an SIP for introducing the concept of node roles: >>>>>> https://issues.apache.org/jira/browse/SOLR-15694 >>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles >>>>>> >>>>>> We also wish to add first class support for Query nodes that are used >>>>>> to process user queries by forwarding to data nodes, merging/aggregating >>>>>> them and presenting to users. This concept exists as first class citizens >>>>>> in most other search engines. This is a chance for Solr to catch up. >>>>>> https://issues.apache.org/jira/browse/SOLR-15715 >>>>>> >>>>>> Regards, >>>>>> Ishan / Noble / Hitesh >>>>>> >>>>> >>>>> >>>>> -- >>>>> http://www.needhamsoftware.com (work) >>>>> http://www.the111shift.com (play) >>>>> >>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
