Ilan, Can you provide a more detailed concrete example? I’m having a lot of trouble understanding what you are proposing, beyond that it is somehow contraindicated with what Ishan/Noble suggest.
Apologies for my failure to understand. Thanks, Mike On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <[email protected]> wrote: > If we go with optional role params, we need two defaults: > 1. the param value to use when the role is specified without a parameter, > and > 2. the param value to use for the role on a node for which the role is > not specified at all. > > I don't know how to sensibly name these defaults, but the actual > values would be: > overseer: default1=preferred, default2=allowed > data: default1=on, default2=on > coordinator: default1=on, default2=off > > If we do not allow specifying a role without a parameter, then > default1 does not exist and the example Noble posted earlier covers > us. But simple roles will be easier to use without parameters (and the > transition from existing overseer role would be trivial). > > On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya > <[email protected]> wrote: > > > > I'm +1 on this. It "looks" complicated at first, but simplifies all > headaches going forward. > > > > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <[email protected]> wrote: > >> > >> I shall update the SIP proposal if we have a consensus on this > configuration > >> > >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <[email protected]> wrote: > >>> > >>> > >>> > >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <[email protected]> wrote: > >>>> > >>>> I like this in that it's an example of how the overseer might be > extended without creating a new role :) > >>>> > >>>> Not entirely sure if I'm for or against an enum implementation here, > but it makes me a bit nervous. Enums with complexity can quickly get into > difficulty for unit tests (especially if one wanted to write a mock object > based test, something I think we maybe should use a bit more than we do). > >>>> > >>>> > >>>> > >>>> I would tend to think of a class to represent and collect role > related functionality, one that perhaps has methods that receive the > request, or other key objects and thus could be tested without standing up > an entire server. (Not against also having them exercised in a few > integrated tests, but the more we can avoid interleaving logic directly > within DispatchFilter and HttpSolrCall etc. the better. > >>>> > >>>> > >>>> So I guess I'm somewhat biased against any enum with more than a > couple properties, and definitely don't want to wind up hanging lots of > methods off of one. Better to use them to consume a configuration value and > then instantiate a class that really holds the logic and data. I like them > for constraining values and easy string value conversion but the more they > look like classes the more I'd rather have a class. > >>> > >>> > >>> I just meant it is a set of values. Please let us not discuss the > actual impl here . We should stick to discussing the high level design here > and specifics should be dealt with in a PR > >>>> > >>>> > >>>> -Gus > >>>> > >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <[email protected]> > wrote: > >>>>> > >>>>> I recommend the following format for the role spec > >>>>> > >>>>> roles=<role-name>:<role-value> > >>>>> > >>>>> each role will have an enum of allowed values and a default value > >>>>> > >>>>> role name: data > >>>>> > >>>>> values: [on, off] > >>>>> default: allowed > >>>>> > >>>>> role name: overseer > >>>>> > >>>>> values: [allowed, disallowed, preferred] > >>>>> default : allowed > >>>>> > >>>>> role name: coordinator > >>>>> > >>>>> values : [on, off] > >>>>> default: off > >>>>> > >>>>> > >>>>> examples > >>>>> roles=data:on,overseer:allowed (This is redundant because it uses > all the default values. If a node is started without any roles value this > is the default behavior) > >>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer > election at head) > >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data, > it's same as roles=coordinator:on) > >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data) > >>>>> > >>>>> > >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <[email protected]> > wrote: > >>>>>> > >>>>>> If we go with no negative node roles and overseer node role is not > strict (i.e. it’s a "preferred overseer"), then one would need to define a > second node role "no_overseer" to explicitly exclude a node from ever > becoming overseer (which I think is a useful feature until we switch the > cluster default to not using the overseer), plus the implementation of > these two node roles will obviously be coupled (and what if a node has both > defined?). > >>>>>> > >>>>>> I prefer strict node roles. > >>>>>> Maybe we could have node roles with [optional] parameters to let > the node role implementation decide ? > >>>>>> The overseer node role for example could have one of 3 values > defined for each node: “preferred” (default, equivalent to the existing > overseer role), "accepted" (equivalent to currently not defining the > overseer role) and "no_way" (does not exist today). > >>>>>> > >>>>>> This could be useful in other contexts. A node role “data” could be > “fast” or “slow” depending on type of local persistent storage for example… > >>>>>> > >>>>>> Ilan > >>>>>> > >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <[email protected]> wrote: > >>>>>>> > >>>>>>> I really don't think we should have types of roles. Not > negative/positive and not strict/non-strict. You have a role or you don't. > What that means is up to the code implementing the role. > >>>>>>> > >>>>>>> Roles should be free to configure a preference order (binary, or > n-ary or whatever, strict or loose), prohibit behavior, or enable behavior. > In this SIP I feel we should focus on How to identify what node has what > role, How to designate what roles a node has via config/params, and the > API's for interacting with roles. > >>>>>>> > >>>>>>> We should for example be able to support roles such as > >>>>>>> > >>>>>>> PREFERRED_OVERSEER > >>>>>>> DATA > >>>>>>> NO_ROUTED_ALIAS (just an example, not something I mean to suggest) > >>>>>>> > >>>>>>> Details about role implementation should probably be discussed in > a thread about that role. Obviously we should think about the name > carefully to leave options open should we want to enhance things later so > maybe > >>>>>>> > >>>>>>> OVERSEER_PREF or just OVERSEER > >>>>>>> > >>>>>>> would be better since it merely reades that the node implements > some sort of preference or config regarding overseer... but all this can be > decided on a per role basis > >>>>>>> > >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <[email protected]> > wrote: > >>>>>>>> > >>>>>>>> Negative roles have a place > >>>>>>>> > >>>>>>>> Example is overseer > >>>>>>>> > >>>>>>>> There are 3 possible choices for that role > >>>>>>>> > >>>>>>>> a) preferred: always be in front of the election queue > >>>>>>>> b) on: not preferred, but can be an overseer if no preferred > overseer nodes are available > >>>>>>>> c) off: never become an overseer > >>>>>>>> > >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we > may implement C > >>>>>>>> > >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>> Negative roles add a lot of complexity, I would really want to > stay away from them. That’s why I want strict roles up front. It’s maybe ok > to push this decision out, but it also seems like the sort of thing we > should consider at the start. > >>>>>>>>> > >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <[email protected]> > wrote: > >>>>>>>>>> > >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for > machine learning purposes, I wouldn't want that node to ever participate in > overseer election > >>>>>>>>>> > >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <[email protected]> > wrote: > >>>>>>>>>>> > >>>>>>>>>>> If we have non strict roles (like overseer), then it does make > sense > >>>>>>>>>>> to have negative roles. > >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer > the > >>>>>>>>>>> overseer to run on, and a few other nodes on which it should > >>>>>>>>>>> definitely never run for various reasons. And in case these > >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the > cluster > >>>>>>>>>>> fail the same way it would if there were no data nodes > available. > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman < > [email protected]> wrote: > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users > cannot trip themselves up by default, but the option is there for people to > tinker and have an iron grip over their cluster. > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The > option to tinker for tighter grip can be tackled later, either on a per > role basis or as a generic concept later. > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not > needed for this SIP > >>>>>>>>>>> > > >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya < > [email protected]> wrote: > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <[email protected]> > wrote: > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> I think the key is to let the roles have full control of > the implications of having/not having that role. No need for even a > strict/loose designation. The question of do you have the role is yes/no > with no logic to guess if the role is implied or not, The question of will > it come up with the role is "have_explicit ? use_defaults : use_defaults. > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that > means is up to the role code. > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works > in this SIP. We can rework it or not as we see fit separately. > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> +1 > >>>>>>>>>>> >> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the > above clear on first read through the SIP :) > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> -Gus > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman < > [email protected]> wrote: > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> This doesn't really address my concern around what > happens if all of our existing OVERSEER candidates are down. When at least > one of them is up, the overseer will go there, and that is good and > expected. But what happens if all of the overseer eligible nodes are down. > Your comment, and the old system, would imply that the overseer election > goes to some other unrelated, untagged node. I disagree with this > implementation choice. This sounds like something role specific to > determine, but I would like to see us be more strict about it. I don't want > cores leaking out of my data roles, I don't want query processing to leak > out of my "query" nodes or whatever. Overseer shouldn't be special in this > regard. > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a > system in which the cluster can be "live" without an overseer. I understand > that the overseer can be taxing to the cluster, but honestly what is the > point of having an untaxed cluster that doesn't have an overseer? I can see > arguments for the other roles to be stricter about this, but there are also > a lot of users who wouldn't want those to be strict either (like "query" > nodes). > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a > non-overseer role node HAS to be selected to become overseer, it will try > to migrate the overseer job to a node with the overseer role whenever one > becomes live. > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but > instead roles can either be defined as "Strict" or "Loose" (better names > likely exist), and the roles come with a default (Overseer -> Loose, Data > -> Strict, Query -> Loose, etc.). And it is up to each role to define how > to behave when running in LOOSE mode and a non-role node is used then a > role node comes online (like the overseer example given above). > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users > cannot trip themselves up by default, but the option is there for people to > tinker and have an iron grip over their cluster. > >>>>>>>>>>> >>>> > >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <[email protected]> > wrote: > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> Noble wrote: > >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works > today. We are just changing the definition and standardizing the > configuration & discoverability > >>>>>>>>>>> >>>>> Ishan wrote: > >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the > OVERSEER role (which currently stands for preferred overseer). We can take > a stab at refactoring it later. > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> Grouping these two comments together, since I think they > are saying the same thing. I think this is part of my confusion. We have an > old system that doesn't work the way we want the new system to work. There > may be people already using the old system. What path do we offer for folks > using the old system to migrate to the new system? What happens if somebody > accidentally tries to use both systems at the same time? > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> Ishan wrote: > >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with > OVERSEER role] are live, Solr guarantees that one of those nodes becomes > the overseer.", I meant to somewhat capture the current behaviour as the > OVERSEER role performs today. Do you see any inconsistency with this > statement vs. what it does today? > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> This doesn't really address my concern around what > happens if all of our existing OVERSEER candidates are down. When at least > one of them is up, the overseer will go there, and that is good and > expected. But what happens if all of the overseer eligible nodes are down. > Your comment, and the old system, would imply that the overseer election > goes to some other unrelated, untagged node. I disagree with this > implementation choice. This sounds like something role specific to > determine, but I would like to see us be more strict about it. I don't want > cores leaking out of my data roles, I don't want query processing to leak > out of my "query" nodes or whatever. Overseer shouldn't be special in this > regard. > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> Noble wrote: > >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a > node in the following request? > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. > Let's leave it as is. > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya < > [email protected]> wrote: > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob < > [email protected]> wrote: > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there > has been a lot of discussion and I don't want to look like I'm continuing > any of those particular threads. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this > with the attention it deserves and am generally happy with how the > conversation has shaped the current proposal. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node > roles is fine and I like that data is the default role when not defined. I > think it is important to hold on to the guarantee that an active overseer > will land on an overseer node role. > >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path > for folks using the current OVERSEER role. I am not sure that something can > be done automatically since they need to now specify new properties at > startup. Maybe we need to include loud warnings or support both approaches > for a time? > >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the > overseer nodes fail, then it is implied the overseer will go to one of the > data nodes. The specific wording in the SIP - "When one or more such nodes > are live, Solr guarantees that one of those nodes become the overseer." > implies to me that failover could go from overseer1 to overseer2 to > overseerN to random node. I feel like we need to have some recording that > there were dedicated overseer nodes and stop the cascading failure instead > of churning through our data nodes. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed > scope of "coordinator" roles from a split query/indexing standpoint. I > understand that these are used as examples, but would like stronger > language that new roles should also go through their own SIP discussions. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node > liveness in two different places now. We have the live nodes and we have > the node roles stored in two different places in zookeeper and it feels > like this would lead to race conditions or split brain or other hard to > diagnose bugs when those two lists don't agree with each other. This also > feels like it contradicts the "single source of truth" idea later stated in > the proposal. I see Gus's arguments for decoupling these and am not > strongly opposed, I just get a lurking feeling about it. Even if we don't > do this, I would like this called out explicitly in the alternative > approaches section as something that we considered and rejected, with > details why, > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an > additional call out here that all operations are GET because nodes cannot > be changed at runtime. > >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the > previous OVERSEER preference role? > >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of > available roles for a cluster. I _think_ this could be based on the version > that the cluster is running? Would be useful to be able to interrogate a > cluster in the future... we're seeing OOM issues on queries, can we add > some query nodes? When were they introduced? I don't know what path this > API should exist at. > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated > the SIP document. Not sure if there's a better path that we could go for. > >>>>>>>>>>> >>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show > which parts are string literals and which parts are meant to be substituted > by the operator? GET /api/cluster/roles/data would become GET > /api/cluster/roles/${rolename} in our SIP/documentation. > >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET > /api/cluster/roles/nodes/node1 should be GET /api/cluster/roles/${nodename} > dropping the intermediate "nodes" > >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need > that intermediate "nodes" node. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some > permissions? Maybe this requirement is too fundamental to the operation of > a cluster and everybody would have to be able to do it. > >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other > clients) to treat roles? Implementation detail that the servers will figure > out? Or strict guidance where the client needs to check where specific > roles are before sending any further communication to the server? > >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request > that it can't fulfil? An overseer node gets a query or an update. A data > node gets a collection creation request. Do they forward it on to an > appropriate node, or do they reject it? Should this be configurable? If > not, then it seems like lazy or poorly configured clients will defeat this > isolation system quite easily. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes. > >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave > when roles are added mean? I thought we established that they are not > dynamic. > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> Thanks, > >>>>>>>>>>> >>>>>>> Mike > >>>>>>>>>>> >>>>>>> > >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya < > [email protected]> wrote: > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> Hi, > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node > roles: > >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694 > >>>>>>>>>>> >>>>>>>> > https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query > nodes that are used to process user queries by forwarding to data nodes, > merging/aggregating them and presenting to users. This concept exists as > first class citizens in most other search engines. This is a chance for > Solr to catch up. > >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715 > >>>>>>>>>>> >>>>>>>> > >>>>>>>>>>> >>>>>>>> Regards, > >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> -- > >>>>>>>>>>> >>> http://www.needhamsoftware.com (work) > >>>>>>>>>>> >>> http://www.the111shift.com (play) > >>>>>>>>>>> > >>>>>>>>>>> > --------------------------------------------------------------------- > >>>>>>>>>>> To unsubscribe, e-mail: [email protected] > >>>>>>>>>>> For additional commands, e-mail: [email protected] > >>>>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> http://www.needhamsoftware.com (work) > >>>>>>> http://www.the111shift.com (play) > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> ----------------------------------------------------- > >>>>> Noble Paul > >>>> > >>>> > >>>> > >>>> -- > >>>> http://www.needhamsoftware.com (work) > >>>> http://www.the111shift.com (play) > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------- > >>> Noble Paul > >> > >> > >> > >> -- > >> ----------------------------------------------------- > >> Noble Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
