Re: First class support for node roles

Ishan Chattopadhyaya Tue, 30 Nov 2021 12:21:37 -0800

On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <[email protected]> wrote:

> Replying to the top post in this thread because there has been a lot of
> discussion and I don't want to look like I'm continuing any of those
> particular threads.
>
> I finally had time to sit down and think about this with the attention it
> deserves and am generally happy with how the conversation has shaped the
> current proposal.
>
> GOOD: I think using system properties to define node roles is fine and I
> like that data is the default role when not defined. I think it is
> important to hold on to the guarantee that an active overseer will land on
> an overseer node role.
> CHANGE REQUEST: I would like to see a migration path for folks using the
> current OVERSEER role. I am not sure that something can be done
> automatically since they need to now specify new properties at startup.
> Maybe we need to include loud warnings or support both approaches for a
> time?
> CHANGE REQUEST: I do not like that if all of the overseer nodes fail, then
> it is implied the overseer will go to one of the data nodes. The specific
> wording in the SIP - "When one or more such nodes are live, Solr guarantees
> that one of those nodes become the overseer." implies to me that failover
> could go from overseer1 to overseer2 to overseerN to random node. I feel
> like we need to have some recording that there were dedicated overseer
> nodes and stop the cascading failure instead of churning through our data
> nodes.
>
> CLARIFICATION: I am slightly confused by the proposed scope of
> "coordinator" roles from a split query/indexing standpoint. I understand
> that these are used as examples, but would like stronger language that new
> roles should also go through their own SIP discussions.
>
> CLARIFICATION: I do not like that we are storing node liveness in two
> different places now. We have the live nodes and we have the node roles
> stored in two different places in zookeeper and it feels like this would
> lead to race conditions or split brain or other hard to diagnose bugs when
> those two lists don't agree with each other. This also feels like it
> contradicts the "single source of truth" idea later stated in the proposal.
> I see Gus's arguments for decoupling these and am not strongly opposed, I
> just get a lurking feeling about it. Even if we don't do this, I would like
> this called out explicitly in the alternative approaches section as
> something that we considered and rejected, with details why,
>
> GOOD: The API looks pretty clear. I would like an additional call out here
> that all operations are GET because nodes cannot be changed at runtime.
> CLARIFICATION: How does this interact with the previous OVERSEER
> preference role?
> CHANGE REQUEST: An additional API to get the list of available roles for a
> cluster. I _think_ this could be based on the version that the cluster is
> running? Would be useful to be able to interrogate a cluster in the
> future... we're seeing OOM issues on queries, can we add some query nodes?
> When were they introduced? I don't know what path this API should exist at.
>


Added a *GET /api/cluster/roles/supported* API, updated the SIP document.
Not sure if there's a better path that we could go for.


> CLARIFICATION: Can we list the APIs to clearly show which parts are string
> literals and which parts are meant to be substituted by the operator?
> *GET **/api/cluster/roles/data *would become *GET 
> **/api/cluster/roles/${rolename}
> *in our SIP/documentation.
> CHANGE REQUEST: I think *GET /api/cluster/roles/nodes/node1* should be *GET
> /api/cluster/roles/${nodename}* dropping the intermediate "nodes"
> CHANGE REQUEST: The ZK structure also might not need that intermediate
> "nodes" node.
>
> CLARIFICATION: Should listing roles require some permissions? Maybe this
> requirement is too fundamental to the operation of a cluster and everybody
> would have to be able to do it.
> CLARIFICATION: How do we expect SolrJ (and other clients) to treat roles?
> Implementation detail that the servers will figure out? Or strict guidance
> where the client needs to check where specific roles are before sending any
> further communication to the server?
> CLARIFICATION: What happens when a node gets a request that it can't
> fulfil? An overseer node gets a query or an update. A data node gets a
> collection creation request. Do they forward it on to an appropriate node,
> or do they reject it? Should this be configurable? If not, then it seems
> like lazy or poorly configured clients will defeat this isolation system
> quite easily.
>
> GOOD: Testing the API is very important, yes.
> CLARIFICATION: What does testing for how nodes behave when roles are added
> mean? I thought we established that they are not dynamic.
>
>
> Thanks,
> Mike
>
> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
> [email protected]> wrote:
>
>> Hi,
>>
>> Here's an SIP for introducing the concept of node roles:
>> https://issues.apache.org/jira/browse/SOLR-15694
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>
>> We also wish to add first class support for Query nodes that are used to
>> process user queries by forwarding to data nodes, merging/aggregating them
>> and presenting to users. This concept exists as first class citizens in
>> most other search engines. This is a chance for Solr to catch up.
>> https://issues.apache.org/jira/browse/SOLR-15715
>>
>> Regards,
>> Ishan / Noble / Hitesh
>>
>

Re: First class support for node roles

Reply via email to