Persistent Cluster Restart Design NotePage edited by Alan ConwayPersistent cluster, user perspective.A persistent cluster is one where all members have a persistent store. A cluster must have all transient or all persistent members, mixed clusters are not allowed. cluster-size optioncluster-size N Wait for at least N initial members before recovering from store and listening for client connections. Use this option so all brokers in a persistent cluster can exchange the status of their persistent stores and do consistency checks before recovering and serving clients. TODO problematic behavior? May hold up start scripts: If qpidd is run with the daemon and cluster-size N options, it will not return until N members have joined. Clean and dirty shut-down.Each store is an independent replica of the cluster's state. If a broker crashes while the rest of the cluster continues, its store is "dirty" because it will be out-of-date with regard to the rest of the cluster. If the broker is re-started to re-join the a running cluster it will discard the dirty store and get an update from an active cluster member to re-synchronize its state. If the entire cluster is shut down by an administrator using the qpid-cluster -k command, then all brokers will shut down at exactly the same point with the same state in their stores. In this case the stores are marked "clean". When the cluster is restarted, brokers with clean stores will recover from their stores, brokers with dirty stores get an update from a clean broker. Consistency checksTwo UUIDs are saved with each broker's store: cluster-id and shutdown-id. These are used during startup to detact a mistaken attempt to use mis-matched stores. The cluster-id identifies the persistent cluster state. It remains the same if the cluster is shut down and restarted. It ensures no accidental mixing of stores belonging to different clusters. The shutdown-id identifies a particular clean shut-down event. It ensures that all clean stores were shut down at the same point. If there is any mis-match in these IDs, all members of the cluster will log a message and exit. Manual recoveryIf every broker in the cluster crashes then they will all have dirty stores. Manual intervention is required to identify the "best" store to recover from. TODO: describe manual intervention: We provide a tool to examine each brokers data-directory, indicate which is most recent and mark it as a clean store so the cluster will use it to recover. Design detailsPersistent restart scenarios:
Other requirements:
Persistent clusterStore statess on broker start-up:
cluster-id is stored on the first run of a persistent cluster. Used to ensure members are part of the same cluster. shutdown-id is stored at administrative shut-down of the cluster. Used to ensure clean stores are from the same shut-down event. Initialization
All empty is a valid store state: all members record the same cluster-id and go active. If any are non empty then
All clean members restore from stores. All empty members set the cluster-id from the cluster. All dirty/empty members get an update from a clean member. JoiningIf the new member has a non-empty store, the cluster-id must match the cluster. The new member gets an update from the cluster. Manual RecoveryIf the entire cluster fails then manual recovery is required. While running brokers will peridiocally (on every membership change and at some configured time interval) write a sequence number to disk. Provide tools to examine broker data directories and determine if they belong to the same cluster (same cluster-id) and if so which is the latest based on the sequence number. Recovery procedure is to mark the latest store as clean and restart the cluster.
Change Notification Preferences
View Online
|
View Change
|
Add Comment
|
- [CONF] Apache Qpid > Persistent Cluster Restart Design Note confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence
- [CONF] Apache Qpid > Persistent Cluster Restart Design ... confluence