On Sat, 2009-04-18 at 09:58 +0200, Dietmar Maurer wrote:
> > > > > like a 'merge' function? Seems the algorithm for checkpoint
> > recovery
> > > > > always uses the state from the node with the lowest processor
> id?
> > > > >
> > > > Yes that is right.
> > >
> > > So if I have the following cluster:
> > >
> > > Part1: node2 node3 node4
> > > Part2: node1
> > >
> > > Let assume Part1 is running for some time and has gathered some
> state
> > in
> > > checkpoints. Part2 is just the newly started node1.
> > >
> > > So when node1 starts up the whole cluster uses the empty checkpoint
> > from
> > > node1? (I  guess I am confused somehow).
> > >
> > > - Dietmar
> > >
> > 
> > The checkpoint service will merge checkpoints from both partitions
> into
> > one view because both node 1 and node2 send out their checkpoint state
> > on a merge operation.
> 
> So does it use a 'merge' function, or always use the state from the node
> with the lowest processor id? If there is a merge function, what
> algorithm is used to merge 2 states?


An older version of the algorithm is described here:
http://www.openais.org/doku.php?id=dev:partition_recovery_checkpoint:checkpoint

It has been updated to deal with some race conditions, but the document
is pretty close.

As you can see, designing the recovery state machine is complicated.

> - Dietmar
>  
> 
> 

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to