----- Original Message ----- > On 02/14/2017 02:51 PM, Nils Carlson wrote: > > Hi, > > > > I'm working on implementing a MariaDB resource-agent based on the mysql > > one. > > The idea is to take advantage of new features in MariaDB, especially > > semi-synchronous replication and GTID. > > > > GTID (Global Transaction ID) means that there is a counter that applies > > to the replicated databases, which is unique within the cluster (there > > can be multiple replication clusters with overlapping ID's). > > > > Semi-synchronous replication means that the master will replicate > > synchronously to AT LEAST ONE slave, before actually performing the > > transaction. In theory there can be no data-loss due to a single node > > failure, a big improvement compared to the normal async replication in > > MariaDB. > > > > These two sets of technologies should allow for quite a straightforward > > set of semantics in the resource-agent. > > On master failure, the node with the highest GTID must be the one that > > was replicating synchronously, and should be promoted to be the new > > master. The question is how to relay the information to crmd. > >
So it looks like you have the same requirements as the galera resource agent. Galera is a "virtual synchronous" replication library for Mysql, where each node records the latest version of the cluster state. (akin to GTID). The galera resource agent is a Master/Slave resource, having the same bootstrapping requirement to restart a cluster from scratch: . during the "start" operation, all nodes store their local state in the CIB with crm-attribute . once all nodes have stored their state, the next "monitor" operation ran on any node will be able to determine the node to bootstrap the cluster from. . the "promote" operation takes care of starting the mysql server on the node . symmetrically, the "demote" stops the server. There are a bunch of edge cases specific to the way Galera works, but you should get the idea. > > My current working hypothesis is that I can place the GTID as a > > crm-attribute both when starting the resource-agent and in a post-demote > > notify. During the subsequent monitor operation the resource-agents can > > then scan the the crm-attributes from other nodes and simply prioritise > > themselves in relation to others (some relative scoring?). > > A bit of a tangent: you can set attributes from a resource agent using > either crm_attribute or attrd_updater. Each has advantages and > disadvantages. > > crm_attribute can set a permanent or transient attribute, while > attrd_updater only sets transient attributes. (A node's transient > attributes go away when the node reboots or otherwise stops cluster > services.) > > crm_attribute can only set public attributes, while attrd_updater can > set public or private attributes. Public attributes are recorded in the > CIB, and when they are changed, it triggers a new transition (i.e. the > cluster checks to see if any resources need to be > started/stopped/moved). Private attributes are not saved to the CIB, and > do not cause a new transition. Public attributes can be referenced in > constraint rules, while private attributes cannot. Private attributes > have been supported since Pacemaker 1.1.13. > > attrd_updater works with Pacemaker Remote nodes only when the cluster > nodes use the corosync 2 stack. It will silently be ignored for > Pacemaker Remote nodes when the cluster nodes use a legacy stack > (heartbeat/cman/corosync-plugin). crm_attribute works with remote nodes > on legacy stacks since Pacemaker 1.1.15. > > I'd prefer attrd_updater with private transient attributes if that works > for your purposes, because it saves unnecessary recalculation of the > cluster state plus disk I/O. > > > This requires a few things though: > > > > - If there is no master when the resource agent starts we need to wait > > for all nodes to come online (i.e) the cluster is just starting before > > promoting any to master, so they can read GTID from the attributes. > > - There must be a monitor step after start and demote and before the > > promotion of any resource to master, and this must execute on all nodes > > so they can set their priority for promotion. > > - The post-demote notifier must complete execution before a node can > > start the monitor operation. I THINK that it is ok for not all nodes to > > have completed the post-demote notifier before the monitor operation > > starts, probably this can work by creating a sparse priority > > distribution, i.e. First node to execute monitor sets a priority of 100 > > - the next one down 90 - the next one in the middle at 95, based on the > > number of nodes etc. > > > > I hope this doesn't sound too tangled, I will try this out, but I can't > > find any clear documentation on the ordering and completion of start, > > notifiers, monitor and promote operations as well as master selection, > > so all pointers are very much welcome. > > > > And completely alternative suggestions also very much welcome. > > > > Thanks for any and all assistance, > > Nils > > You may want to look at the ocf:heartbeat:galera agent -- I believe it > has some similar concerns. As Ken said :) > > _______________________________________________ > Developers mailing list > [email protected] > http://lists.clusterlabs.org/mailman/listinfo/developers > _______________________________________________ Developers mailing list [email protected] http://lists.clusterlabs.org/mailman/listinfo/developers
