As for the "query" role, let's name it something better like "compute",
since data nodes are always going to be "querying". "compute" is for
something like the first node for a distributed query, or a
StreamingExpressions query.

But I agree with the idea that roles should only be "positive", you
shouldn't be able to specify -Dnodes.roles=!data.
Also if no nodes have a given role specified, then all nodes should be
considered for that role. E.g. if no live nodes have roles=overseer (or
roles=all), then we should just select any node to be overseer. This should
be the same for compute, data, etc.

So, for the proposal, lets say "data" is a special role which is assumed by
> default, and is enabled on all nodes unless there's a !data.
>

Instead of  this, maybe we have role groups. Such as admin~=overseer,zk or
worker~=compute,data,updateProcessing

As for the suggested Roles, I'm not sure ADMIN or UI really fit, since
there is another option to disable the UI for a solr node, and various
ADMIN commands have to be accepted across other node roles. (Data nodes
require the Collections API, same with the overseer.)

- Houston

On Wed, Oct 27, 2021 at 1:34 PM Ishan Chattopadhyaya <
[email protected]> wrote:

> bq. In other words, roles are all "positive", but their consequences are
> only negative (rejecting when the matching positive role is not present).
>
> Essentially, yes. A node that doesn't specify any role should be able to
> do everything.
>
> Let me just take a brief detour and mention our thoughts on the "query"
> role. While all data nodes can also be used for querying, our idea was to
> create a layer of nodes that have some special mechanism to be able to
> proxy/forward queries to data nodes (lets call it "pseudo cores" or
> "synthetic cores" or "proxy cores". Our thought was that any node that has
> "query,!data" role would enable this special mode on startup (whereby
> requests are served by these special pseudo cores). We'll discuss about
> this in detail in that issue.
>
> Back to the main subject here.
>
> Lets take a practical scenario:
> * Layer1: Organization has about 100 nodes, each node has many data
> replicas
> * Layer2: To manage such a large cluster reliably, they keep aside 4-5
> dedicated overseer nodes.
> * Layer3: Since query aggregations/coordination can potentially be
> expensive, they keep aside 5-10 query nodes.
>
> My preference would be as follows:
> * I'd like to refer to Layer1 nodes as the "data nodes" and hence get
> either no role defined for them or -Dnode.roles=data.
> * I'd like to refer to Layer2 nodes as "overseer nodes" (even though I
> understand, only one of them can be an overseer at a time). I'd like to
> have -Dnode.roles=!data,overseer
> * I'd like to refer to Layer3 nodes as "query nodes", with
> -Dnode.roles=!data,query
>
> ^ This seems very practical from operational standpoint.
>
> So, for the proposal, lets say "data" is a special role which is assumed
> by default, and is enabled on all nodes unless there's a !data. It is
> presumed that data nodes can also serve queries directly, so adding a
> "query" to those nodes is meaningless (also because there's no practical
> benefit to stopping a data node from receiving a query for "!query" role to
> be useful).
>
> "query" role on nodes that don't host data really refers to a special
> capability for lightweight, stateless nodes. I don't want to add a "!query"
> on dedicated overseer nodes, and hence I don't want to assume that "query"
> is implicitly avaiable on any node even if the role isn't specified.
>
> "overseer" role is complicated, since it is already defined and we don't
> have the opportunity to define it the right way. I'd hate having to put a
> "!overseer" on every data node on startup in order to have a few dedicated
> overseers.
>
> In short, in this SIP, I just wish to implement the concept of nodes and
> its handling. How individual roles are leveraged can be up to every new
> role's implementation.
>
>
>
>
>
> On Wed, Oct 27, 2021 at 9:54 PM Gus Heck <[email protected]> wrote:
>
>>
>>
>>> In other words, roles are all "positive", but their consequences are
>>> only negative (rejecting when the matching positive role is not present).
>>>
>>> Yeah right. to do something the machine needs the role
>>
>>
>>> We can also consider no role defined = all roles allowed. Will make
>>> things simpler.
>>>
>>
>> in terms of startup command yes. Internally we should have all explicitly
>> assigned when no roles are specified at startup so that the code doesn't
>> have a million if checks for the empty case
>>
>>
>>>
>>> On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]>
>>> wrote:
>>>
>>>> How do we expect the roles to be used?
>>>> One way I see is a node refusing to do anything related to a role it
>>>> doesn't have.
>>>> For example if a node does not have role "data", any attempt to create
>>>> a core on it would fail.
>>>> A node not having the role "query", will refuse to have anything to do
>>>> with handling a query etc.
>>>> Then it would be up to other code to make sure only the appropriate
>>>> nodes are requested to do any type of action.
>>>> So for example any replica placement code plugin would have to restrict
>>>> the set of candidate nodes for a new replica placement to those having
>>>> "data". Otherwise the call would fail, and there should be nothing the
>>>> replica placement code can do about it.
>>>>
>>>> Similarly, the "overseer" role would limit the nodes that participate
>>>> in the Overseer election. The Overseer election code would have to remove
>>>> (or not add) all non qualifying nodes from the election, and we should
>>>> expect a node without role "overseer" to refuse to start the Overseer
>>>> machinery if asked to...
>>>>
>>>> Trying to make the use case clear regarding how roles are used.
>>>> Ilan
>>>>
>>>> On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Gus,
>>>>>>
>>>>>> > I think that we should expand/edit your list of roles to be
>>>>>>
>>>>>> The list can be expanded as and when more isolation and features are
>>>>>> needed. I only listed those roles that we already have a functionality 
>>>>>> for
>>>>>> or is under development.
>>>>>>
>>>>>
>>>>> Well all of those roles (except zookeeper) are things nodes do today.
>>>>> As it stands they are all doing all of them. What we add support for as we
>>>>> move forward is starting without a role, and add the zookeeper role when
>>>>> that feature is ready.
>>>>>
>>>>>
>>>>>> > I would like to recommend that the roles be all positive ("Can do
>>>>>> this") and nodes with no role at all are ineligible for all activities.
>>>>>>
>>>>>> It comes down to the defaults and backcompat. If we want all Solr
>>>>>> nodes to be able to host data replicas by default (without user 
>>>>>> explicitly
>>>>>> specifying role=data), then we need a way to unset this role. The most
>>>>>> reasonable way sounded like a "!data". We can do away with !data if we
>>>>>> mandate each and every data node have the role "data" explicitly defined
>>>>>> for it, which breaks backcompat and also is cumbersome to use for those 
>>>>>> who
>>>>>> don't want to use these special roles.
>>>>>>
>>>>>>
>>>>> Not sure I understand, which of the roles I mentioned (other than
>>>>> zookeeper, which I expect is intended as different from our current
>>>>> embedded zk) is NOT currently supported by a single cloud node brought up
>>>>> as shown in our tutorials/docs? I'm certainly not proposing that the
>>>>> default change to nothing. The default is all roles, unless you specify
>>>>> roles at startup.
>>>>>
>>>>>
>>>>>> > I also suggest that these roles each have a node in zookeeper
>>>>>> listing the current member nodes (as child nodes) so that code that wants
>>>>>> to find a node with an appropriate role does not need to scan the list of
>>>>>> all nodes parsing something to discover which nodes apply and also does 
>>>>>> not
>>>>>> have to parse json to do it.
>>>>>>
>>>>>> /roles.json exists today, it has role as key and list of nodes as
>>>>>> value. In the next major version, we can change the format of that file 
>>>>>> and
>>>>>> use key as node, value as list of roles. Or, maybe we can go for adding 
>>>>>> the
>>>>>> roles to the data for each item in the list of live_nodes.
>>>>>>
>>>>>>
>>>>> I'm not finding anything in our documentation about roles.json so I
>>>>> think it's an internal implementation detail, which reduces back compat
>>>>> concerns. ADDROLE/REMOVEROLE don't accept json or anything like that and
>>>>> could be made to work with zk nodes too.
>>>>>
>>>>> The fact that some precursor work was done without a SIP (or before
>>>>> SIPs existed) should not hamstring our design once a SIP that clearly
>>>>> covers the same topic is under consideration. By their nature SIP's are
>>>>> non-trivial and often will include compatibility breaks. Good news is I
>>>>> don't think I see one here, just a code change to transition to a 
>>>>> different
>>>>> zk backend. I think that it's probably a mistake to consider our zookeeper
>>>>> data a public API and we should be moving away from that or at the very
>>>>> least segregating clearly what in zk is long term reliable. Ideally our
>>>>> v1/v2 api's should be the public api through which information about the
>>>>> cluster is obtained. Programming directly against zk is kind of like a
>>>>> custom build of solr. Sometimes useful and appropriate, but maintenance is
>>>>> your concern. For code plugging into solr, it should in theory be against
>>>>> an internal information java api, and zookeeper should not be touched
>>>>> directly. (I know this is not in a good state or at least wasn't last time
>>>>> I looked closely, but it should be where we are heading).
>>>>>
>>>>> > any code seeking to transition a node
>>>>>>
>>>>>> We considered this situation and realized that it is very risky to
>>>>>> have nodes change roles while they are up and running. Better to assign
>>>>>> fixed roles upon startup.
>>>>>>
>>>>>
>>>>> I agree that concurrency is hard. I definitely think startup time
>>>>> assignments should be involved here. I'm not thinking that every 
>>>>> transition
>>>>> must be supported. As a starting point it would be fine if none were.
>>>>> Having something suddenly become zookeeper is probably tricky to support
>>>>> (see discussion in that thread regarding nodes not actually participating
>>>>> until they have a partner to join with them to avoid even numbered
>>>>> clusters), but I think the design should not preclude the possibility of
>>>>> nodes becoming eligible for some roles or withdrawing from some roles, and
>>>>> treatment of roles should be consistent. In some cases someone may decide
>>>>> it's worth the work of handling the concurrency concerns, best if they
>>>>> don't have to break back compat or hack their code around the assumption 
>>>>> it
>>>>> wouldn't happen to do it.
>>>>>
>>>>> Taking the zookeeper case as an example, it very much might be
>>>>> desirable to have the possibility to heal the zk cluster by promoting
>>>>> another node (configured as eligible for zk) to active zk duty if one of
>>>>> the current zk nodes has been down long enough (say on prem hardware,
>>>>> motherboard pops a capacitor, server gone for a week while new hardware is
>>>>> purchased, built and configured). Especially if the down node didn't hold
>>>>> data or other nodes had sufficient replicas and the cluster is still
>>>>> answering queries just fine.
>>>>>
>>>>>
>>>>>>
>>>>>> > I know of a case that would benefit from having separate
>>>>>> Query/Update nodes that handle a heavy analysis process which would be
>>>>>> deployed to a number of CPU heavy boxes (which might add more in prep for
>>>>>> bulk indexing, and remove them when bulk was done), data could then be
>>>>>> hosted on cheaper nodes....
>>>>>>
>>>>>> This is the main motivation behind this work. SOLR-15715 needs this,
>>>>>> and hence it would be good to get this in as soon as possible.
>>>>>>
>>>>>
>>>>> I think we can incrementally work towards configurability for all of
>>>>> these roles. The current default state is that a node has all roles and 
>>>>> the
>>>>> incremental progress is to enable removing a role from a node. This I 
>>>>> think
>>>>> is why it might be good to to
>>>>>
>>>>> A) Determine the set of roles our current solr nodes are performing
>>>>> (that might be removed in some scenario) and document this via assigning
>>>>> these roles as default on as this SIP goes live.
>>>>> B) Figure out what the process of adding something entirely new that
>>>>> we haven't yet thought of with its own role would look like.
>>>>>
>>>>> I think it would be great if we not only satisfied the current need
>>>>> but determined how we expect this to change over time.
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Ishan
>>>>>>
>>>>>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]> wrote:
>>>>>>
>>>>>>> The SIP looks like a good start, and I was already thinking of
>>>>>>> something very similar to this as a follow on to my attempts to split 
>>>>>>> the
>>>>>>> uber filter (SolrDispatchFilter) into servlets such that roles determine
>>>>>>> what servlets are deployed, but I would like to recommend that the 
>>>>>>> roles be
>>>>>>> all positive ("Can do this") and nodes with no role at all are 
>>>>>>> ineligible
>>>>>>> for all activities. (just like standard role permissioning systems). 
>>>>>>> This
>>>>>>> will make it much more familiar and easy to think about. Therefore there
>>>>>>> would be no need for a role such as !data which I presume was meant to 
>>>>>>> mean
>>>>>>> "no data on this node"... rather just don't give the "data" role to the
>>>>>>> node.
>>>>>>>
>>>>>>> Additional node roles I think should exist:
>>>>>>>
>>>>>>> I think that we should expand/edit your list of roles to be
>>>>>>>
>>>>>>>    - QUERY - accepts and analyzes queries up to the point of
>>>>>>>    actually consulting the lucene index (useful if you have a very heavy
>>>>>>>    analysis phase)
>>>>>>>    - UPDATE - accepts update requests, and performs update
>>>>>>>    functionality prior to and including 
>>>>>>> DistributedUpdateProcessorFactory
>>>>>>>    (useful if you have a very heavy analysis phase)
>>>>>>>    - ADMIN - accepts admin/management commands
>>>>>>>    - UI - hosts an admin ui
>>>>>>>    - ZOOKEEPER - hosts embedded zookeeper
>>>>>>>    - OVERSEER - performs overseer related functionality (though
>>>>>>>    IIRC there's a proposal to eliminate overseer that might eliminate 
>>>>>>> this)
>>>>>>>    - DATA - nodes where there is a lucene index and matching
>>>>>>>    against the analyzed results of a query may be conducted to generate 
>>>>>>> a
>>>>>>>    response, also performs update steps that come after
>>>>>>>    DistributedUpdateProcesserFactory
>>>>>>>
>>>>>>> I also suggest that these roles each have a node in zookeeper
>>>>>>> listing the current member nodes (as child nodes) so that code that 
>>>>>>> wants
>>>>>>> to find a node with an appropriate role does not need to scan the list 
>>>>>>> of
>>>>>>> all nodes parsing something to discover which nodes apply and also does 
>>>>>>> not
>>>>>>> have to parse json to do it. I think this will be particularly key for
>>>>>>> zookeeper nodes which might be 3 out of 100 or more nodes. Similar to 
>>>>>>> how
>>>>>>> we track live nodes. I think we should have a nodes.json too that tracks
>>>>>>> what roles a node is ALLOWED to take (as opposed to which roles it
>>>>>>> currently servicing)
>>>>>>>
>>>>>>> So running code consults the zookeeper role list of nodes, and any
>>>>>>> code seeking to transition a node (an admin operation with much lower
>>>>>>> performance requirements) consults the json data in the nodes.json node,
>>>>>>> parses it, finds the node in question and checks what it's eligible for
>>>>>>> (this will correspond to which servlets/apps have been loaded).
>>>>>>>
>>>>>>> I know of a case that would benefit from having separate
>>>>>>> Query/Update nodes that handle a heavy analysis process which would be
>>>>>>> deployed to a number of CPU heavy boxes (which might add more in prep 
>>>>>>> for
>>>>>>> bulk indexing, and remove them when bulk was done), data could then be
>>>>>>> hosted on cheaper nodes....
>>>>>>>
>>>>>>> Also maybe think about how this relates to NRT/TLOG/PULL which are
>>>>>>> also maybe role like
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> -Gus
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>
>>>>>>>> We also wish to add first class support for Query nodes that are
>>>>>>>> used to process user queries by forwarding to data nodes,
>>>>>>>> merging/aggregating them and presenting to users. This concept exists 
>>>>>>>> as
>>>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>>>> Solr to catch up.
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> http://www.the111shift.com (play)
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Reply via email to