>
> CLARIFICATION: I do not like that we are storing node liveness in two
> different places now. We have the live nodes and we have the node roles
> stored in two different places in zookeeper and it feels like this would
> lead to race conditions or split brain or other hard to diagnose bugs when
> those two lists don't agree with each other. This also feels like it
> contradicts the "single source of truth" idea later stated in the proposal.
> I see Gus's arguments for decoupling these and am not strongly opposed, I
> just get a lurking feeling about it. Even if we don't do this, I would like
> this called out explicitly in the alternative approaches section as
> something that we considered and rejected, with details why,
>
>
Yes, I had that thought and reconciled it for myself by
realizing/theorizing that the new structure does not represent liveness. It
represents "roleness" which is a different bit of information. Using it in
code as a check for liveness would be wrong. In any case we always need to
be prepared to handle the case that the node disappeared between when we
checked the list (roles or live_nodes) and when we tried to contact the
node. Power cord could have unplugged in the interval.
> CHANGE REQUEST: The ZK structure also might not need that intermediate
> "nodes" node.
>
I argued for the extra level for the following reason: Roles may want to
coordinate additional information in zk (who IS overseer vs who COULD BE)
or perhaps a pre-determined election order to speed up elections. Or Zk
nodes might want to record the desired redundancy for zk, and how long past
zk nodes have been down to bring up another zk from the pool of zk nodes if
some timeout has been exceeded... And Ilan gave an example I don't recall
at the moment that persuaded me that we can't really predict what each role
wants to tack so my capable vs providing distinction got converted to a
space in the struture (peer to nodes) to track any such info that a role
needs to. thus namespacing role related stuff, enabling recursive watch if
desired, and generally keeping role related coordination data organized
together in zk. So for concrete example you might have:
/node-roles
/zookeeper
/nodes
host1_8983_solr {"rack":"A"}
host2_8983_solr {"rack":"A"}
host21_8983_solr {"rack":"B"}
host22_8983_solr {"rack":"B"}
host41_8983_solr {"rack":"C"}
host42_8983_solr {"rack":"C"}
/cluster { "redundancy":"3", maxDownMin:"30", rackAwareElections:"true"
}
/current
host1_8983_solr
host21_8983_solr
/missing
host41_8983_solr { "since":"2021-12-23T12:34:56.7890"}
/election
Note that this cluster could have 60 live nodes... 6 with th zk role. Just
an example of course... we might not choose these features for zk nodes,
but the point is to leave a spot with which to implement that we decide we
want. Also the json at /cluster might simply be attached to /zookeeper
instead.. but at the moment we aren't specifying how roles handle their
role specific coordination data.
>
> CLARIFICATION: What happens when a node gets a request that it can't
> fulfil? An overseer node gets a query or an update. A data node gets a
> collection creation request. Do they forward it on to an appropriate node,
> or do they reject it? Should this be configurable? If not, then it seems
> like lazy or poorly configured clients will defeat this isolation system
> quite easily.
>
This seems like something for each role to decide and/or configure.
Specifically thinking of request to be elected overseer, Cluster level
config (at /cluster or attached to the role node as above?) could determine
if the existing (back compatible) fallback to non-overseer is desired or
not...