On Sat, 2009-04-18 at 09:58 +0200, Dietmar Maurer wrote: > > > > > like a 'merge' function? Seems the algorithm for checkpoint > > recovery > > > > > always uses the state from the node with the lowest processor > id? > > > > > > > > > Yes that is right. > > > > > > So if I have the following cluster: > > > > > > Part1: node2 node3 node4 > > > Part2: node1 > > > > > > Let assume Part1 is running for some time and has gathered some > state > > in > > > checkpoints. Part2 is just the newly started node1. > > > > > > So when node1 starts up the whole cluster uses the empty checkpoint > > from > > > node1? (I guess I am confused somehow). > > > > > > - Dietmar > > > > > > > The checkpoint service will merge checkpoints from both partitions > into > > one view because both node 1 and node2 send out their checkpoint state > > on a merge operation. > > So does it use a 'merge' function, or always use the state from the node > with the lowest processor id? If there is a merge function, what > algorithm is used to merge 2 states?
An older version of the algorithm is described here: http://www.openais.org/doku.php?id=dev:partition_recovery_checkpoint:checkpoint It has been updated to deal with some race conditions, but the document is pretty close. As you can see, designing the recovery state machine is complicated. > - Dietmar > > > _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais