Re: [Openais] howto distribute data accross all nodes? Reply-To:

2009-04-20 Thread David Teigland
On Fri, Apr 17, 2009 at 10:56:47PM -0700, Steven Dake wrote:
 On Sat, 2009-04-18 at 07:49 +0200, Dietmar Maurer wrote:
like a 'merge' function? Seems the algorithm for checkpoint recovery
always uses the state from the node with the lowest processor id?
   
   Yes that is right.
  
  So if I have the following cluster:
  
  Part1: node2 node3 node4
  Part2: node1
  
  Let assume Part1 is running for some time and has gathered some state in
  checkpoints. Part2 is just the newly started node1.
  
  So when node1 starts up the whole cluster uses the empty checkpoint from
  node1? (I  guess I am confused somehow).
 
 The checkpoint service will merge checkpoints from both partitions into
 one view because both node 1 and node2 send out their checkpoint state
 on a merge operation.

That doesn't make any sense, I can't believe that's how it works, the
resulting content would be complete nonsense.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 07:49:12AM +0200, Dietmar Maurer wrote:
   like a 'merge' function? Seems the algorithm for checkpoint recovery
   always uses the state from the node with the lowest processor id?
  
  Yes that is right.
 
 So if I have the following cluster:
 
 Part1: node2 node3 node4
 Part2: node1
 
 Let assume Part1 is running for some time and has gathered some state in
 checkpoints. Part2 is just the newly started node1.
 
 So when node1 starts up the whole cluster uses the empty checkpoint from
 node1? (I  guess I am confused somehow).

It is *not* as simple as node with the low nodeid.  It is node with the low
nodeid where the state exists.  When selecting the node to send state to
others, you obviously need to select among nodes that have the state :-)

In the dlm_controld example I mentioned earlier, the function called
set_plock_ckpt_node() picks the node that will save state in the ckpt:

list_for_each_entry(memb, cg-members, list) {
if (!(memb-start_flags  DLM_MFLG_HAVEPLOCK))
continue;

if (!low || memb-nodeid  low)
low = memb-nodeid;
}

Only nodes that have state will have the DLM_MFLG_HAVEPLOCK flag set; new
nodes just added by a confchg will not have that flag set.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 03:55:57AM -0700, Steven Dake wrote:
 On Sat, 2009-04-18 at 12:47 +0200, Dietmar Maurer wrote:
  At least the SA Forum does not mention such strange behavior. Isn't that
  a serious bug?

Yes, I'd consider it a serious bug.

  Consider 2 Partitions with one checkpoint:
  
  Part1: CkptSections ABC
  Part2: CkptSections BCD
  
  After the merge, you have: CkptSections ABCD
  
  And even worse, section contains data from different partitions (old
  data mixed with new one)? And there is no notification that such things
  happens?

That ckpt behavior is nonsensical for most real applications I'd wager.
I'm going to have to go check whether my apps are protected from that.

 The SA Forum doesn't consider at all how to handle partitions in a
 network or at least not very suitably (up to designer of SA Forum
 services).  They assume that applications will be using the AMF, and
 rely on the AMF functionality to reboot partitioned nodes (fencing) so
 this condition doesn't occur.

They don't consider it presumably because *it doesn't make any sense*.

 The SA Forum services were not designed with partitioned networks in
 mind.  It is unfortunate, but it is what it is.  If an app needs true
 consistently without some form of fencing, the app designer has to take
 partitions into consideration when designing their applications.
 
 This is why I recommend using CPG for these types of environments
 because it provides better design control over exactly how data
 remerges.

If the SAF services don't specify what should happen when clusters with
divergent state are combined, then it probably means it should not happen, and
you should probably not allow the unspecified behavior instead of making
something up.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Dietmar Maurer
  like a 'merge' function? Seems the algorithm for checkpoint recovery
  always uses the state from the node with the lowest processor id?
 
 Yes that is right.

So if I have the following cluster:

Part1: node2 node3 node4
Part2: node1

Let assume Part1 is running for some time and has gathered some state in
checkpoints. Part2 is just the newly started node1.

So when node1 starts up the whole cluster uses the empty checkpoint from
node1? (I  guess I am confused somehow).

- Dietmar 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Dietmar Maurer
like a 'merge' function? Seems the algorithm for checkpoint
 recovery
always uses the state from the node with the lowest processor
id?
   
   Yes that is right.
 
  So if I have the following cluster:
 
  Part1: node2 node3 node4
  Part2: node1
 
  Let assume Part1 is running for some time and has gathered some
state
 in
  checkpoints. Part2 is just the newly started node1.
 
  So when node1 starts up the whole cluster uses the empty checkpoint
 from
  node1? (I  guess I am confused somehow).
 
  - Dietmar
 
 
 The checkpoint service will merge checkpoints from both partitions
into
 one view because both node 1 and node2 send out their checkpoint state
 on a merge operation.

So does it use a 'merge' function, or always use the state from the node
with the lowest processor id? If there is a merge function, what
algorithm is used to merge 2 states?

- Dietmar
 


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Steven Dake
On Sat, 2009-04-18 at 09:58 +0200, Dietmar Maurer wrote:
 like a 'merge' function? Seems the algorithm for checkpoint
  recovery
 always uses the state from the node with the lowest processor
 id?

Yes that is right.
  
   So if I have the following cluster:
  
   Part1: node2 node3 node4
   Part2: node1
  
   Let assume Part1 is running for some time and has gathered some
 state
  in
   checkpoints. Part2 is just the newly started node1.
  
   So when node1 starts up the whole cluster uses the empty checkpoint
  from
   node1? (I  guess I am confused somehow).
  
   - Dietmar
  
  
  The checkpoint service will merge checkpoints from both partitions
 into
  one view because both node 1 and node2 send out their checkpoint state
  on a merge operation.
 
 So does it use a 'merge' function, or always use the state from the node
 with the lowest processor id? If there is a merge function, what
 algorithm is used to merge 2 states?


An older version of the algorithm is described here:
http://www.openais.org/doku.php?id=dev:partition_recovery_checkpoint:checkpoint

It has been updated to deal with some race conditions, but the document
is pretty close.

As you can see, designing the recovery state machine is complicated.

 - Dietmar
  
 
 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Dietmar Maurer
 An older version of the algorithm is described here:

http://www.openais.org/doku.php?id=dev:partition_recovery_checkpoint:ch
 eckpoint
 
 It has been updated to deal with some race conditions, but the
document
 is pretty close.
 
 As you can see, designing the recovery state machine is complicated.

I still don't get the point. That algorithm computes something, yes. But
can it be described in words, or better in C code?

char *merge_checkpoint_data(char *cp_data_1, char *cp_data2);

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Steven Dake
On Sat, 2009-04-18 at 10:24 +0200, Dietmar Maurer wrote:
  An older version of the algorithm is described here:
 
 http://www.openais.org/doku.php?id=dev:partition_recovery_checkpoint:ch
  eckpoint
  
  It has been updated to deal with some race conditions, but the
 document
  is pretty close.
  
  As you can see, designing the recovery state machine is complicated.
 
 Is there any guarantee that all sections inside a recovered checkpoint
 are from the same cluster partition (I can't see such restriction in the
 algorithm)?
 

no

The algorithm will merge sections created in both partitions into the
single checkpoint.

 - Dietmar
 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Steven Dake
On Sat, 2009-04-18 at 12:47 +0200, Dietmar Maurer wrote:
   Is there any guarantee that all sections inside a recovered
  checkpoint
   are from the same cluster partition (I can't see such restriction in
  the
   algorithm)?
  
  
  no
  
  The algorithm will merge sections created in both partitions into the
  single checkpoint.
 
 At least the SA Forum does not mention such strange behavior. Isn't that
 a serious bug?
 
 Consider 2 Partitions with one checkpoint:
 
 Part1: CkptSections ABC
 Part2: CkptSections BCD
 
 After the merge, you have: CkptSections ABCD
 
 And even worse, section contains data from different partitions (old
 data mixed with new one)? And there is no notification that such things
 happens?
 

The SA Forum doesn't consider at all how to handle partitions in a
network or at least not very suitably (up to designer of SA Forum
services).  They assume that applications will be using the AMF, and
rely on the AMF functionality to reboot partitioned nodes (fencing) so
this condition doesn't occur.

The SA Forum services were not designed with partitioned networks in
mind.  It is unfortunate, but it is what it is.  If an app needs true
consistently without some form of fencing, the app designer has to take
partitions into consideration when designing their applications.

This is why I recommend using CPG for these types of environments
because it provides better design control over exactly how data
remerges.

Regards
-steve

 - Dietmar
 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-18 Thread Dietmar Maurer
 The SA Forum doesn't consider at all how to handle partitions in a
 network or at least not very suitably (up to designer of SA Forum
 services).  They assume that applications will be using the AMF, and
 rely on the AMF functionality to reboot partitioned nodes (fencing) so
 this condition doesn't occur.
 
 The SA Forum services were not designed with partitioned networks in
 mind.  It is unfortunate, but it is what it is.  If an app needs true
 consistently without some form of fencing, the app designer has to
take
 partitions into consideration when designing their applications.
 
 This is why I recommend using CPG for these types of environments
 because it provides better design control over exactly how data
 remerges.

On the other side, it should be easy to fix the checkpoint algorithm. Or
document that behavior at least.

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-17 Thread Steven Dake
On Sat, 2009-04-18 at 07:49 +0200, Dietmar Maurer wrote:
   like a 'merge' function? Seems the algorithm for checkpoint recovery
   always uses the state from the node with the lowest processor id?
  
  Yes that is right.
 
 So if I have the following cluster:
 
 Part1: node2 node3 node4
 Part2: node1
 
 Let assume Part1 is running for some time and has gathered some state in
 checkpoints. Part2 is just the newly started node1.
 
 So when node1 starts up the whole cluster uses the empty checkpoint from
 node1? (I  guess I am confused somehow).
 
 - Dietmar 
 

The checkpoint service will merge checkpoints from both partitions into
one view because both node 1 and node2 send out their checkpoint state
on a merge operation.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-16 Thread Dietmar Maurer
 Check out:
 http://www.openais.org/doku.php?id=dev:paritition_recovery
 
 Instead of using the token callback method, you could write your own
 methodology for executing the state machine.

Ah, OK - I think that is what I already do. What is miss is something
like a 'merge' function? Seems the algorithm for checkpoint recovery
always uses the state from the node with the lowest processor id?

- Dietmar


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-16 Thread Steven Dake
On Thu, 2009-04-16 at 08:20 +0200, Dietmar Maurer wrote:
  You might try taking a look at exec/sync.c
  
  it is a synchronization engine.  Basically it takes configuration
  changes into account to call sync_init, sync_process, sync_abort, or
  sync_activate.  These 4 states then activate the new dato a model.
  This
  could easily be done as an addon api that uses CPG and would be a
  welcome addition.
  
  If you want more details on how the state machine works, let me know.
 
 That code seems to directly access the totem protocol (create a new
 token). So I cant use such thing with CPG? Or what can I learn from that
 code?
 
 - Dietmar
 
 
Check out:
http://www.openais.org/doku.php?id=dev:paritition_recovery

Instead of using the token callback method, you could write your own
methodology for executing the state machine.

regards
-steve


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread David Teigland
On Tue, Apr 14, 2009 at 02:05:10PM +0200, Dietmar Maurer wrote:
 So CPG provide a framework to implement distributed finite state
 machines (DFSM). But there is no standard way to get the initial state
 of the DFSM. Almost all applications need to get the initial state, so I
 wonder if it would make sense to provide a service which solves that
 problem (at least as a example).
 
 My current solution is:
 
   I introduce a CPG mode, which is either:
 
   DFSM_MODE_SYNC ... CPG is syncing state. only state synchronization
  messages allowed. Other messages are
 delayed/queued.
 
   DFSM_MODE_WORK ... STATE is synced accross all members - normal
 operation. Queued
  messages are delivered when we reach this state.
 
   When a new node joins, CPG immediately change mode to
   DFSM_MODE_SYNC. Then all members send their state.
 
   When a node received the states of all members, it computes the new
   state by merging all received states (dfsm_state_merge_fn), and
   finally switches mode to DFSM_MODE_WORK.
 
 Does that make sense?

Yes.  I'm not sure if a generic service for this would be used much or not...
maybe.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread Dietmar Maurer
When a new node joins, CPG immediately change mode to
DFSM_MODE_SYNC. Then all members send their state.
 
When a node received the states of all members, it computes the
new
state by merging all received states (dfsm_state_merge_fn), and
finally switches mode to DFSM_MODE_WORK.
 
  Does that make sense?
 
 Yes.  I'm not sure if a generic service for this would be used much or
 not...
 maybe.

Btw, how large is an average checkpoint in dlm? Just wonder how much
data needs to be transferred. 

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread Dietmar Maurer
   Yes.  I'm not sure if a generic service for this would be used
much
 or
   not...
   maybe.
 
  Btw, how large is an average checkpoint in dlm? Just wonder how much
  data needs to be transferred.
 
 I've never measured, but it's a trivially small amount of data.
Around
 32
 bytes of data per posix lock used on the fs (often zero if apps aren't
 using
 posix locks.)

OK, then performance is not an issue. I just took a quick look at the
dlm code, and I think that code could be simplified with such generic
service (which would be tested by a larger user base)? Anyways, I will
try to write a prototype too see if it is useful.

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread Steven Dake
On Tue, 2009-04-14 at 14:05 +0200, Dietmar Maurer wrote:
 So CPG provide a framework to implement distributed finite state
 machines (DFSM). But there is no standard way to get the initial state
 of the DFSM. Almost all applications need to get the initial state, so I
 wonder if it would make sense to provide a service which solves that
 problem (at least as a example).
 
 My current solution is:
 
   I introduce a CPG mode, which is either:
 
   DFSM_MODE_SYNC ... CPG is syncing state. only state synchronization
  messages allowed. Other messages are
 delayed/queued.
 
   DFSM_MODE_WORK ... STATE is synced accross all members - normal
 operation. Queued
  messages are delivered when we reach this state.
 
   When a new node joins, CPG immediately change mode to
   DFSM_MODE_SYNC. Then all members send their state.
 
   When a node received the states of all members, it computes the new
   state by merging all received states (dfsm_state_merge_fn), and
   finally switches mode to DFSM_MODE_WORK.
 
 Does that make sense?
 

Cool idea if it wasn't totally integrated with CPG but instead some
external service which people could use in addition to CPG.

It would make a great addition as a service engine or c library using
CPG.

Regards
-steve


 - Dietmar
 
 
  On Thu, Apr 09, 2009 at 09:00:08PM +0200, Dietmar Maurer wrote:
If new, normal read/write messages to the replicated state
 continue
  while
the new node is syncing the pre-existing state, the new node needs
  to save
those operations to apply after it's synced.
  
   Ah, that probably works. But can lead to very high memory usage if
  traffic
   is high.
  
  If that's a problem you could block normal activity during the sync
  period.
  
   Is somebody really using that? If so, is there some code available
   (for safe/replay)?
  
  There is no general purpose code.  dlm_controld is an example of a
  program
  doing something like this, http://git.fedorahosted.org/git/dlm.git
  
  It uses cpg to replicate state of posix locks, uses checkpoints to
 sync
  existing lock state to new nodes, and saves messages on a new node
  until it
  has completed syncing (i.e. reading pre-existing state from the
  checkpoint.)
  
  Dave
 
 

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread Dietmar Maurer
 You might try taking a look at exec/sync.c
 
 it is a synchronization engine.  Basically it takes configuration
 changes into account to call sync_init, sync_process, sync_abort, or
 sync_activate.  These 4 states then activate the new dato a model.
 This
 could easily be done as an addon api that uses CPG and would be a
 welcome addition.
 
 If you want more details on how the state machine works, let me know.

Yes, please tell me more.

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread Dietmar Maurer
 Cool idea if it wasn't totally integrated with CPG but instead some
 external service which people could use in addition to CPG.

Thats the plan. 

My current problem is the state merge function. There should be a
standard way to merge state (when the user does not want to provide his
own merge function). I thought the following is possible:

- always us state from lowest node id

- compare states and do voting?

- timestamp based selection ?

- always use state from oldest member (life time is time diff
since process started)

- use totem ring_id/token_seq somehow ?

- combinations of above merge functions ?

Any other suggestions?

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread Dietmar Maurer
 What I recommend here is to place your local node id in the message
 contents (retrieved via cpg_local_get) and then compare that nodeid to
 incoming messages

Why do you include the local node id into the message? I can compare the
local node id with the sending node id without that, for example:

...
cpg_local_get(cpg_handle, local_nodeid);
...

static void cpg_deliver_callback (cpg_handle_t handle,
  struct cpg_name *groupName,
  uint32_t nodeid,
  uint32_t pid,
  void *msg,
  int msg_len)
{
...
if (nodeid == local_nodeid) 
...
}

Or do I miss something?

- Dietmar
 


___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread Steven Dake
On Thu, 2009-04-09 at 10:19 +0200, Dietmar Maurer wrote:
  What I recommend here is to place your local node id in the message
  contents (retrieved via cpg_local_get) and then compare that nodeid to
  incoming messages
 
 Why do you include the local node id into the message? I can compare the
 local node id with the sending node id without that, for example:
 
 ...
 cpg_local_get(cpg_handle, local_nodeid);
 ...
 
 static void cpg_deliver_callback (cpg_handle_t handle,
 struct cpg_name *groupName,
 uint32_t nodeid,
 uint32_t pid,
 void *msg,
 int msg_len)
 {
 ...
   if (nodeid == local_nodeid) 
 ...
 }
 
 Or do I miss something?
 

Sorry I forgot that was in the callback parameter.  You should be good
to go with the method proposed with your code.

Lots of code, hard to remember all the interfaces :)

Regards
-steve


 - Dietmar
  
 
 
 ___
 Openais mailing list
 Openais@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread Dietmar Maurer
   need for locks.  An example of why not is creation of a resource
  called
   datasetA.
  
   3 nodes:
   node A sends create datasetA
   node B sends create datasetA
   node C sends create datasetA
  
   Only one of those nodes create dataset will arrive first.  The
   remainder
   will arrive second and third.  Also, vs requires that each node
 sends
   in
   the same order so it may be something like on all nodes:
   B received, C received, A received.
  
   In this case, B creates the dataset, C says dataset exists A
says
   dataset exists.  All nodes see this same ordering

But how does a node gets its initial state?? When a node joins it does
not know the
state of the other nodes, but it receives state change messages from
other nodes. A distributed state machine only works if a node knows the
state before joining the group.

Is there a 'standard' solution to that problem?

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread Dietmar Maurer
 1. Have an old cpg member (e.g. the one with the lowest nodeid) send
 messages
 containing the state to the new node after it's joined.  These sync
 messages
 are separate from the messages used to read/write the replicated state
 during
 normal operation.

This is not bullet proof. State can change while joining node is
not 100% synced, resulting in endless sync messages?
 
 2. Have an old cpg member write all the state to a checkpoint (see
 saCkpt)
 when a node joins, it sends a message to the new node when it's done
 writing
 indicating that the checkpoint is ready, and the new node then reads
 the state
 from the checkpoint.

same problem as above.

 There are probably other ways of doing this as well.
 
 If new, normal read/write messages to the replicated state continue
 while the
 new node is syncing the pre-existing state, the new node needs to save
 those
 operations to apply after it's synced.

Ah, that probably works. But can lead to very high memory usage if
traffic is high. Is somebody really using that? If so, is there some
code available (for safe/replay)?

- Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 09:00:08PM +0200, Dietmar Maurer wrote:
  If new, normal read/write messages to the replicated state continue while
  the new node is syncing the pre-existing state, the new node needs to save
  those operations to apply after it's synced.
 
 Ah, that probably works. But can lead to very high memory usage if traffic
 is high.

If that's a problem you could block normal activity during the sync period.

 Is somebody really using that? If so, is there some code available
 (for safe/replay)?

There is no general purpose code.  dlm_controld is an example of a program
doing something like this, http://git.fedorahosted.org/git/dlm.git

It uses cpg to replicate state of posix locks, uses checkpoints to sync
existing lock state to new nodes, and saves messages on a new node until it
has completed syncing (i.e. reading pre-existing state from the checkpoint.)

Dave
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread Dietmar Maurer
  Ah, that probably works. But can lead to very high memory usage if
 traffic
  is high.
 
 If that's a problem you could block normal activity during the sync
 period.

wow. that 'virtual synchrony' sound nice first, but gets incredible
complex soon ;-)

 
  Is somebody really using that? If so, is there some code available
  (for safe/replay)?
 
 There is no general purpose code.  dlm_controld is an example of a
 program
 doing something like this, http://git.fedorahosted.org/git/dlm.git
 
 It uses cpg to replicate state of posix locks, uses checkpoints to
sync
 existing lock state to new nodes, and saves messages on a new node
 until it
 has completed syncing (i.e. reading pre-existing state from the
 checkpoint.)

Thanks for that link,

Dietmar

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-08 Thread Steven Dake
On Wed, 2009-04-08 at 11:02 +0200, Dietmar Maurer wrote:
 Hi all,
 
 each node in my cluster has some data which should be available to all
 nodes. I also want to update that data in a consistent/atomic way:
 
 1.) acquire global lock
 2.) read/update/write data
 3.) release lock
 

Virtual synchrony is tailor made for this problem.  Send messages via
CPG.  On delivery of the message, take some action to read update or
write the data to your internal data structures.  Do not do the
updating/reading/writing on the transmit of the message but only on
_delivery_.

This removes the need to have a lock entirely since the entire update
can be one atomic message and it will be ordered according to virtual
synchrony semantics.

Since every node sees every message in the same order there is never a
need for locks.  An example of why not is creation of a resource called
datasetA.

3 nodes:
node A sends create datasetA
node B sends create datasetA
node C sends create datasetA

Only one of those nodes create dataset will arrive first.  The remainder
will arrive second and third.  Also, vs requires that each node sends in
the same order so it may be something like on all nodes:
B received, C received, A received.

In this case, B creates the dataset, C says dataset exists A says
dataset exists.  All nodes see this same ordering.  The only reason to
use a lock service in practice is because there are not ordering
guarantees in the transmit medium of the system.

Hope that helps

Regards
-steve

 What the best way to implement that. For locking I can use the lock
 service. But what's the best/simplest way to distribute the node data to
 all members? I tried using CPG, but there I no synchronous message
 delivery available - which makes above read/update/write impossible?
 

just do all updates/actions/etc on _delivery_ instead of transmit.

 Another option is using CKPT, but CKPT does not have any change
 notification? I need to combine that with CPG?
 
 Maybe there is some example code around where I can see how to implement
 such system? Or some other documentation?
 
 Many thank for your help,
 
 - Dietmar
 
  
 
 
 
 ___
 Openais mailing list
 Openais@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais