[ https://issues.apache.org/jira/browse/QPID-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782961#action_12782961 ]
Alan Conway commented on QPID-2220: ----------------------------------- To clarify the situation: the problem is recovering from a total cluster failure, no clean stores. We want to identify the store that is the most up to date, i.e. the last one modifed in respect of cluster order. We can do a pretty good job just in cluster code by recording config changes. Now if 2 or more brokers were killed at the same configuration , we'd like a more fine grained comparison. Using the Persistence ID works for the Red Hat store because it is a monotonically increasing value that gets incremented for (almost) every change to the store (currently not incremented for deleting queues/exchanges/bindings.) So if we record the PID value N with the config-change and in recovery we find the store is at PID M then we know there were M-N changes to that store since the config change. Thats a number we can meaningfully compare for brokers that died at the same membership. Factors that make this work: - value that increases with each change to the db. - at runtime we can query the current value to save at each config change - in recovery we can find the value associated with the database Is that something we could have as an optional API on a MessageStore, or should we put it on a separate plugin that can optionally be provided by the store plugin. > Assisting manual recovery from a complete persistent cluster crash. > ------------------------------------------------------------------- > > Key: QPID-2220 > URL: https://issues.apache.org/jira/browse/QPID-2220 > Project: Qpid > Issue Type: Improvement > Components: C++ Broker > Affects Versions: 0.5 > Reporter: Alan Conway > Assignee: Alan Conway > > If every member of a persistent cluster crashes then manual intervention is > required to identify which store is most up-to-date, so it can be used to > recover. We need to provide tools to assist in this identification. > The cluster can save a config-change counter with each config change (cluster > membership change). In recovery, the broker with the highest config-change > counter has the best store. > However if the last brokers in the cluster crash so close together that none > can record a config-change we need an additional decider. > The store at http://qpidcomponents.org/download.html#persistence maintains a > global Persistence ID, a 64 bit value that is incremented for each enqueue, > dequeue. If the cluster stores (config-change,PID) pairs then in recovery we > can use actual-PID - config-change PID as a tiebreaker. > Proposed change to MessageStore API: > /** Returns a monotonically increasing value reflecting changes to the > store. > * The value can wrap-around to 0. > * Stores need not implement this function, they can simply return 0. > */ > uint64_t getChangeCounter(); > The default implementation just returns 0 and the cluster must fall back to > relying on config-change counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org