Hi Gus,
> I think that we should expand/edit your list of roles to be
The list can be expanded as and when more isolation and features are
needed. I only listed those roles that we already have a functionality for
or is under development.
> I would like to recommend that the roles be all positive ("Can do this")
and nodes with no role at all are ineligible for all activities.
It comes down to the defaults and backcompat. If we want all Solr nodes to
be able to host data replicas by default (without user explicitly
specifying role=data), then we need a way to unset this role. The most
reasonable way sounded like a "!data". We can do away with !data if we
mandate each and every data node have the role "data" explicitly defined
for it, which breaks backcompat and also is cumbersome to use for those who
don't want to use these special roles.
> I also suggest that these roles each have a node in zookeeper listing the
current member nodes (as child nodes) so that code that wants to find a
node with an appropriate role does not need to scan the list of all nodes
parsing something to discover which nodes apply and also does not have to
parse json to do it.
/roles.json exists today, it has role as key and list of nodes as value. In
the next major version, we can change the format of that file and use key
as node, value as list of roles. Or, maybe we can go for adding the roles
to the data for each item in the list of live_nodes.
> any code seeking to transition a node
We considered this situation and realized that it is very risky to have
nodes change roles while they are up and running. Better to assign fixed
roles upon startup.
> I know of a case that would benefit from having separate Query/Update
nodes that handle a heavy analysis process which would be deployed to a
number of CPU heavy boxes (which might add more in prep for bulk indexing,
and remove them when bulk was done), data could then be hosted on cheaper
nodes....
This is the main motivation behind this work. SOLR-15715 needs this, and
hence it would be good to get this in as soon as possible.
Regards,
Ishan
On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]> wrote:
> The SIP looks like a good start, and I was already thinking of something
> very similar to this as a follow on to my attempts to split the uber filter
> (SolrDispatchFilter) into servlets such that roles determine what servlets
> are deployed, but I would like to recommend that the roles be all positive
> ("Can do this") and nodes with no role at all are ineligible for all
> activities. (just like standard role permissioning systems). This will make
> it much more familiar and easy to think about. Therefore there would be no
> need for a role such as !data which I presume was meant to mean "no data on
> this node"... rather just don't give the "data" role to the node.
>
> Additional node roles I think should exist:
>
> I think that we should expand/edit your list of roles to be
>
> - QUERY - accepts and analyzes queries up to the point of actually
> consulting the lucene index (useful if you have a very heavy analysis
> phase)
> - UPDATE - accepts update requests, and performs update functionality
> prior to and including DistributedUpdateProcessorFactory (useful if you
> have a very heavy analysis phase)
> - ADMIN - accepts admin/management commands
> - UI - hosts an admin ui
> - ZOOKEEPER - hosts embedded zookeeper
> - OVERSEER - performs overseer related functionality (though IIRC
> there's a proposal to eliminate overseer that might eliminate this)
> - DATA - nodes where there is a lucene index and matching against the
> analyzed results of a query may be conducted to generate a response, also
> performs update steps that come after DistributedUpdateProcesserFactory
>
> I also suggest that these roles each have a node in zookeeper listing the
> current member nodes (as child nodes) so that code that wants to find a
> node with an appropriate role does not need to scan the list of all nodes
> parsing something to discover which nodes apply and also does not have to
> parse json to do it. I think this will be particularly key for zookeeper
> nodes which might be 3 out of 100 or more nodes. Similar to how we track
> live nodes. I think we should have a nodes.json too that tracks what roles
> a node is ALLOWED to take (as opposed to which roles it currently servicing)
>
> So running code consults the zookeeper role list of nodes, and any code
> seeking to transition a node (an admin operation with much lower
> performance requirements) consults the json data in the nodes.json node,
> parses it, finds the node in question and checks what it's eligible for
> (this will correspond to which servlets/apps have been loaded).
>
> I know of a case that would benefit from having separate Query/Update
> nodes that handle a heavy analysis process which would be deployed to a
> number of CPU heavy boxes (which might add more in prep for bulk indexing,
> and remove them when bulk was done), data could then be hosted on cheaper
> nodes....
>
> Also maybe think about how this relates to NRT/TLOG/PULL which are also
> maybe role like
>
> WDYT?
>
> -Gus
>
>
> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya <
> [email protected]> wrote:
>
>> Hi,
>>
>> Here's an SIP for introducing the concept of node roles:
>> https://issues.apache.org/jira/browse/SOLR-15694
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>
>> We also wish to add first class support for Query nodes that are used to
>> process user queries by forwarding to data nodes, merging/aggregating them
>> and presenting to users. This concept exists as first class citizens in
>> most other search engines. This is a chance for Solr to catch up.
>> https://issues.apache.org/jira/browse/SOLR-15715
>>
>> Regards,
>> Ishan / Noble / Hitesh
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>