Re: [Openais] cpg behavior on transitional membership change

2011-09-02 Thread David Teigland
On Fri, Sep 02, 2011 at 10:30:53AM -0700, Steven Dake wrote:
 On 09/02/2011 12:59 AM, Vladislav Bogdanov wrote:
  Hi all,
  
  I'm trying to further investigate problem I described at
  https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html
  
  The main problem for me there is that pacemaker first sees transitional
  membership with left nodes, then it sees stable membership with that
  nodes returned back, and does nothing about that. On the other hand,
  dlm_controld sees CPG_REASON_NODEDOWN events on CPGs related to all its
  lockspaces (at the same time with transitional membership change) and
  stops kernel part of each lockspace until whole cluster is rebooted (or
  until some other recovery procedure which unfortunately does not happen
 
 I believe fenced should reboot the node, but only if there is quorum.
 It is possible your cluster has lost quorum during this series of
 events.  I have copied Dave for his feedback on this point.

I really can't make any sense of the report, sorry.  Maybe reproduce it
without pacemaker, and then describe the specific steps to create the
issue and resulting symptoms.  After that we can determine what logs, if
any, would be useful.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Announcing Corosync 1.3.0

2011-01-13 Thread David Teigland
On Thu, Jan 13, 2011 at 08:09:13AM -0700, Steven Dake wrote:
 On 01/13/2011 08:03 AM, Lars Marowsky-Bree wrote:
  On 2010-12-01T14:18:25, Steven Dake sd...@redhat.com wrote:
  
  Corosync 1.3.0 is available for immediate download from our website.
  This version brings many enhancements to the software.  The two most
  visible enhancements are UDPU transport mode and the
  cpg_model_initialize api call.  The UDPU transport omde allows Corosync
  to run over basic UDP transport without the need for multicast support
  in the cluster switching environment.  The API addition allows for
  correct operation of cluster file systems such as gfs2 and ocfs2.
 
  
  Hi Steven,
  
  can you elaborate please - how are current cluster file system
  implementations not correct, or is this a mere API enhancement?
  
 
 Dave has more details, but we needed to add an API to give ring id
 information to fenced to prevent fenced from entering a stuck state.
 The conditions leading up to the problem are difficult for me to recall,
 but it did happen in community testing as well as internal validation.
 I recommend pinging dct offline if you need more information.  (There is
 a fenced patch that goes with this, and perhaps something else).

Steve is explaining the addition of cpg_totem_confchg_fn:
http://www.corosync.org/git/?p=corosync.git;a=commitdiff;h=e8b143595cd3b3827c044164873c7825bc65b726

which I used here:
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=6dd65cf344d05730cbccb99ce5265e84f762bfde

after asking for here:
https://lists.linux-foundation.org/pipermail/openais/2009-September/013022.html

I wouldn't explain it in terms of clusterfs's or fenced.  It's really
addressing a shortcoming in the corosync API's.  The problem was that it
was impossible to correlate a callback from the cpg library with a
callback from another library for the same underlying event.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [TOTEM ] A processor joined or left the membership and a new membership was formed.

2010-04-23 Thread David Teigland
I'm always looking for ways to make debugging/diagnosing corosync easier
since it's notoriously difficult.  I've always just ignored the messages in
the subject line; they seem more or less equivalent to something happened.
(The length of corosync messages tend to be inversely proportional to their
usefulness.)

Is there some information we could put in those messages to make them
useful?  I was recently looking at ring id's in my app, and found it would
be helpful to correlate with what appeared in /var/log/messages, but there
are no ring id's there.  Would this message or another be a sensible place
to put a ring id?

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH orosync] select a new sync member if the node with the lowest nodeid has left.

2010-04-22 Thread David Teigland
On Thu, Apr 22, 2010 at 11:06:19AM +1000, Angus Salkeld wrote:
 Problem:
 
 Under certain circumstances cpg does not send group leave messages.
 
 With a big token timeout (tested with token == 5min).
 1 start all nodes
 2 start ./test/testcpg on all nodes
 2 go to the node with the lowest nodeid
 3 ifconfig int down  killall -9 corosync  /etc/init.d/corosync restart 
  ./testcpg
 4 the other nodes will not get the cpg leave event
 5 testcpg reports an extra cpg group (basically one was not removed)
 
 Solution:
 If a member gets removed using the new trans_list and
 that member is the node used for syncing (lowest nodeid)
 then the next lowest node needs to be chosen for syncing.
 
 David would you mind confirming that this solves your problem?

It works great, thanks!
Dave


 -Angus
 
 Signed-off-by: Angus Salkeld asalk...@redhat.com
 ---
  services/cpg.c |   36 
  1 files changed, 36 insertions(+), 0 deletions(-)
 
 diff --git a/services/cpg.c b/services/cpg.c
 index ede426f..e9926ac 100644
 --- a/services/cpg.c
 +++ b/services/cpg.c
 @@ -414,6 +414,27 @@ struct req_exec_cpg_downlist {
  
  static struct req_exec_cpg_downlist g_req_exec_cpg_downlist;
  
 +static int memb_list_remove_value (unsigned int *list,
 + size_t list_entries, int value)
 +{
 + int j;
 + int found = 0;
 +
 + for (j = 0; j  list_entries; j++) {
 + if (list[j] == value) {
 + /* mark next values to be copied down */
 + found = 1;
 + }
 + else if (found) {
 + list[j-1] = list[j];
 + }
 + }
 + if (found)
 + return (list_entries - 1);
 + else
 + return list_entries;
 +}
 +
  static void cpg_sync_init_v2 (
   const unsigned int *trans_list,
   size_t trans_list_entries,
 @@ -432,6 +453,21 @@ static void cpg_sync_init_v2 (
   sizeof (unsigned int));
   my_member_list_entries = member_list_entries;
  
 + for (i = 0; i  my_old_member_list_entries; i++) {
 + found = 0;
 + for (j = 0; j  trans_list_entries; j++) {
 + if (my_old_member_list[i] == trans_list[j]) {
 + found = 1;
 + break;
 + }
 + }
 + if (found == 0) {
 + my_member_list_entries = memb_list_remove_value (
 + my_member_list, my_member_list_entries,
 + my_old_member_list[i]);
 + }
 + }
 +
   for (i = 0; i  my_member_list_entries; i++) {
   if (my_member_list[i]  lowest_nodeid) {
   lowest_nodeid = my_member_list[i];
 -- 
 1.6.6.1
 
 
 ___
 Openais mailing list
 Openais@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH orosync] select a new sync member if the node with the lowest nodeid has left.

2010-04-22 Thread David Teigland
On Thu, Apr 22, 2010 at 04:35:08PM -0500, David Teigland wrote:
 On Thu, Apr 22, 2010 at 11:06:19AM +1000, Angus Salkeld wrote:
  Problem:
  
  Under certain circumstances cpg does not send group leave messages.
  
  With a big token timeout (tested with token == 5min).
  1 start all nodes
  2 start ./test/testcpg on all nodes
  2 go to the node with the lowest nodeid
  3 ifconfig int down  killall -9 corosync  /etc/init.d/corosync 
  restart  ./testcpg
  4 the other nodes will not get the cpg leave event
  5 testcpg reports an extra cpg group (basically one was not removed)
  
  Solution:
  If a member gets removed using the new trans_list and
  that member is the node used for syncing (lowest nodeid)
  then the next lowest node needs to be chosen for syncing.
  
  David would you mind confirming that this solves your problem?
 
 It works great, thanks!

That was after two tests, and it may have been a bit hasty...
when I went back to do some further tests, I happened to make a slight
mistake running the usual steps, and the node failure then went unnoticed
like before.  When repeating the mistake intentionally, I get the same
problem.  This new test is:

1 nodes 1,2,3,4: cman_tool join
2 create iptables partition: 1 | 2,3,4
3 node 1: kill -9 corosync
4 remove iptables partition: 1,2,3,4
5 node 1: cman_tool join
6 nodes 1,2,3,4: fenced; fence_tool join
7 create iptables partition: 1 | 2,3,4
8 node 1: kill -9 corosync
9 remove iptables partition: 1,2,3,4
10 node 1: cman_tool join
11 no confchg removing 1 from the fenced cpg on nodes 2,3,4

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] segfault in objdb

2010-04-21 Thread David Teigland
I'm using trunk svnversion 2770.  I ran 'service cman start' on four nodes,
which I do all the time, and one segfaulted here,

Core was generated by `corosync -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x7f1437774eb9 in object_find_next (
object_find_handle=4760538031444721676, object_handle=0x7f1434031b78)
at objdb.c:889
889 ((object_instance-object_name_len ==
Missing separate debuginfos, use: debuginfo-install corosync-1.2.0-1.fc12.x86_64
(gdb) bt
#0  0x7f1437774eb9 in object_find_next (
object_find_handle=4760538031444721676, object_handle=0x7f1434031b78)
at objdb.c:889
#1  0x7f143438c999 in message_handler_req_lib_confdb_object_find (
conn=0xe75b90, message=0x7f142f9ff000) at confdb.c:697
#2  0x7f143813b3af in pthread_ipc_consumer (conn=0xe75b90) at coroipcs.c:701
#3  0x003065206a3a in start_thread () from /lib64/libpthread.so.0
#4  0x003064ade67d in clone () from /lib64/libc.so.6
#5  0x in ?? ()

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-19 Thread David Teigland
On Thu, Apr 08, 2010 at 04:57:22PM +0200, Jan Friesse wrote:
 commit 0d509f4bf23f618c940c3bcdd7cf0e97faf64876
 Author: Jan Friesse jfrie...@redhat.com
 Date:   Thu Apr 8 16:48:45 2010 +0200
 
 CPG model_initialize and ringid + members callback
 
 Patch adds new function to initialize cpg, cpg_model_initialize. Model
 is set of callbacks. With this function, future addions of models
 should  be possible without changing the ABI.
 
 Patch also contains callback in CPG_MODEL_V1 for notification about
 Totem membership changes.

I've been doing extensive testing with this patch, and it's working well
(2010-04-08-cpg_model+totem_cb.patch); ack from me on going ahead with it.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] stuck on sem_timedwait

2010-04-14 Thread David Teigland
On Wed, Apr 14, 2010 at 12:57:14PM +0200, Jan Friesse wrote:
 David,
 in that case, corosync exits (so it is really not running) or not?

Yep, the corosync process is gone.

 
 David Teigland wrote:
  When corosync exits, my application (fenced) gets stuck.
  
  # strace -p 2005
  Process 2005 attached - interrupt to quit
  restart_syscall(... resuming interrupted call ...) = -1 ETIMEDOUT 
  (Connection 
  timed out)
  poll([{fd=14, events=0}], 1, 0) = 1 ([{fd=14, revents=POLLNVAL}])
  gettimeofday({1271185487, 264}, NULL)   = 0
  futex(0x7f0a66f5b028, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, 
  {1271185489, 0}
  , ) = -1 ETIMEDOUT (Connection timed out)
  poll([{fd=14, events=0}], 1, 0) = 1 ([{fd=14, revents=POLLNVAL}])
  gettimeofday({1271185489, 198}, NULL)   = 0
  futex(0x7f0a66f5b028, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, 
  {1271185491, 0}, ) = -1 ETIMEDOUT (Connection timed out)
  
  0x00338d00d417 in sem_timedwait () from /lib64/libpthread.so.0
  Missing separate debuginfos, use: debuginfo-install cman-3.0.7-1.fc12.x86_64
  (gdb) bt
  #0  0x00338d00d417 in sem_timedwait () from /lib64/libpthread.so.0
  #1  0x003713e02311 in reply_receive (ipc_instance=0x2379ed0, 
  res_msg=0x768c6a50, res_len=16) at coroipcc.c:476
  #2  0x003713e02e7e in coroipcc_msg_send_reply_receive (
  handle=3265522690949120001, iov=0x768c6a80, iov_len=1, 
  res_msg=0x768c6a50, res_len=16) at coroipcc.c:1045
  #3  0x003713a01ed3 in cpg_finalize (handle=5902762718137417729)
  at cpg.c:238
  #4  0x00403542 in close_cpg_daemon ()
  at /root/stable3/fence/fenced/cpg.c:2311
  #5  0x0040b26d in loop (argc=value optimized out, 
  argv=value optimized out) at /root/stable3/fence/fenced/main.c:831
  #6  main (argc=value optimized out, argv=value optimized out)
  at /root/stable3/fence/fenced/main.c:1045
  
  ___
  Openais mailing list
  Openais@lists.linux-foundation.org
  https://lists.linux-foundation.org/mailman/listinfo/openais
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] stuck on sem_timedwait

2010-04-13 Thread David Teigland
When corosync exits, my application (fenced) gets stuck.

# strace -p 2005
Process 2005 attached - interrupt to quit
restart_syscall(... resuming interrupted call ...) = -1 ETIMEDOUT (Connection 
timed out)
poll([{fd=14, events=0}], 1, 0) = 1 ([{fd=14, revents=POLLNVAL}])
gettimeofday({1271185487, 264}, NULL)   = 0
futex(0x7f0a66f5b028, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1271185489, 0}
, ) = -1 ETIMEDOUT (Connection timed out)
poll([{fd=14, events=0}], 1, 0) = 1 ([{fd=14, revents=POLLNVAL}])
gettimeofday({1271185489, 198}, NULL)   = 0
futex(0x7f0a66f5b028, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1271185491, 
0}, ) = -1 ETIMEDOUT (Connection timed out)

0x00338d00d417 in sem_timedwait () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install cman-3.0.7-1.fc12.x86_64
(gdb) bt
#0  0x00338d00d417 in sem_timedwait () from /lib64/libpthread.so.0
#1  0x003713e02311 in reply_receive (ipc_instance=0x2379ed0, 
res_msg=0x768c6a50, res_len=16) at coroipcc.c:476
#2  0x003713e02e7e in coroipcc_msg_send_reply_receive (
handle=3265522690949120001, iov=0x768c6a80, iov_len=1, 
res_msg=0x768c6a50, res_len=16) at coroipcc.c:1045
#3  0x003713a01ed3 in cpg_finalize (handle=5902762718137417729)
at cpg.c:238
#4  0x00403542 in close_cpg_daemon ()
at /root/stable3/fence/fenced/cpg.c:2311
#5  0x0040b26d in loop (argc=value optimized out, 
argv=value optimized out) at /root/stable3/fence/fenced/main.c:831
#6  main (argc=value optimized out, argv=value optimized out)
at /root/stable3/fence/fenced/main.c:1045

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-09 Thread David Teigland
On Fri, Apr 09, 2010 at 09:33:30AM +0200, Jan Friesse wrote:
 Dave,
 
 
 Oh, and I may have just invented a time machine by merging partitioned
 clusters!
 
 1270661597 cluster node 1 added seq 2128
 1270661597 fenced:daemon conf 3 1 0 memb 1 2 4 join 1 left
 1270661597 cpg_mcast_joined retried 4 protocol
 1270661597 fenced:daemon ring 1:2128 3 memb 1 2 4
 1270661597 fenced:default conf 3 1 0 memb 1 2 4 join 1 left  (*)
 1270661597 add_change cg 5 joined nodeid 1
 1270661597 add_change cg 5 counts member 3 joined 1 remove 0 failed 0
 1270661597 check_ringid cluster 2128 cpg 2:2124
 1270661597 fenced:default ring 1:2128 3 memb 1 2 4  (**)
 1270661597 check_ringid done cluster 2128 cpg 1:2128
 1270661597 check_quorum done
 
 * confchg callback adding node 1
 ** totem callback adding node 1
 
 this is something little different and it is one of your requirements.

Yes, this ordering makes sense and works.  I was just pointing out that
it's not *always* true that a totem callback precedes a confchg callback
when adding a node.  Obviously Chrissie was thinking about a node starting
up and not the case of partition merging.

 ^^^ This is what you are talking about. Confchg precede totem
 callback (as your requirements)

I never had a hard requirement about callback ordering, because I didn't
know exactly what effect it would have.  But my suggestion was that when
an event caused both confchg and totem callbacks to be queued for a cpg,
the confchg_cb be queued first and the totem_cb be queued second.

Now that I've stepped through my test case a couple times with this issue
in mind, I don't think I actually require any specific ordering of
callbacks.  It looks like things will work the same regardless.

 Anyway, can you please send me (exactly) what problem (original
 problem) are you trying to solve?

My test case that hasn't worked (until now), is the following:

a. members 1,2,3
b. partition 1 / 2,3
c. merge 1,2,3
d. cluster is killed on node 1
e. cluster is started on node 1

In this case nodes 2 and 3 see:

a. cluster = 1,2,3
b. cluster -1 2228
c. cluster +1 2232
d. cluster -1 2236
e. cluster +1 2240

(cluster +/-N M is cman callback adding/removing nodeid N with ringid M)

Node 2 begins fencing node 1 in step b, but I've configured fencing to
fail indefinitely, so the fencing doesn't complete on 2 until step e when
it sees node 1 restart cleanly (without its state).

So *after* step e, node 2 dispatches the following callbacks back to back:

u. conf +1
v. ring +1 2232
w. conf -1
x. ring -1 2236
y. ring +1 2240
z. conf +1

(conf +/-N is confchg callback adding/removing nodeid N)
(ring +/-N M is totem callback adding/removing nodeid N with ringid M)

Two problems I had which the the new ring id resolves:

- When I saw w, I didn't know if this was a new failure that hadn't yet
been reported via a cluster (cman) callback, or whether it was an old
failure.  In this case it corresponds to d, which I now know because the
ringid in x is 2236 and the current cluster ringid is 2240.  This is
important because I need to know whether the current quorum value from
cman is consistent with the state I've seen from cpg.

- I only want to process the latest confchg because two matching
confchg's, e.g. u and z, are otherwise impossible to uniquely reference
between nodes.  I refer to both u and z as confchg adding nodeid 1
resulting in members 1,2,3)  If I process u, other nodes can sometimes not
tell whether I'm referring confchg u or z.

Both of these problems resulted in my app (fenced) getting one of those
two things wrong and becoming stuck.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-08 Thread David Teigland
On Thu, Apr 08, 2010 at 04:15:06PM +0100, Christine Caulfield wrote:
 On 08/04/10 15:57, Jan Friesse wrote:
 Included is patch solving 2nd problem.
 
 In first problem, I agree with Chrissie, and really don't have any
 single idea how to make regular confchg precede totem_confchg.
 
 We can't. That is the order in which things happen. Short of
 implementing some form of time-machine in corosync it's not going to
 change :S

That makes sense.  I need to go back to the drawing board on all this and
figure out if the totem callback approach is going to solve the problems
I have.

Oh, and I may have just invented a time machine by merging partitioned
clusters!

1270661597 cluster node 1 added seq 2128
1270661597 fenced:daemon conf 3 1 0 memb 1 2 4 join 1 left
1270661597 cpg_mcast_joined retried 4 protocol
1270661597 fenced:daemon ring 1:2128 3 memb 1 2 4
1270661597 fenced:default conf 3 1 0 memb 1 2 4 join 1 left  (*)
1270661597 add_change cg 5 joined nodeid 1
1270661597 add_change cg 5 counts member 3 joined 1 remove 0 failed 0
1270661597 check_ringid cluster 2128 cpg 2:2124
1270661597 fenced:default ring 1:2128 3 memb 1 2 4  (**)
1270661597 check_ringid done cluster 2128 cpg 1:2128
1270661597 check_quorum done

* confchg callback adding node 1
** totem callback adding node 1

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-08 Thread David Teigland
On Thu, Apr 08, 2010 at 04:57:22PM +0200, Jan Friesse wrote:
 Included is patch solving 2nd problem.

Thanks, it works for me.

 In first problem, I agree with Chrissie, and really don't have any
 single idea how to make regular confchg precede totem_confchg.

I've stepped through things and it looks like the confchg/totem callbacks
will work fine as they are, I don't think I need any change because of
ordering.  It just didn't match my initial expectations of what I'd see.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG model_init + callback with totem ringid and members

2010-04-07 Thread David Teigland
On Tue, Apr 06, 2010 at 02:05:00PM +0200, Jan Friesse wrote:
 Same patch but rebased on top of Steve's change (today trunk).

Thanks, this is mostly working well, but I've found one problem, and one
additional thing I need (mentioned on irc already):

1. When a node joins, I get the totem callback before the corresponding
confchg callback.  When a node leaves I get them in the expected order:
confchg followed by totem callback.

2. When my app starts up it needs to be able to get the current ring id,
so we need to be able to get/force an initial totem callback after a
cpg_join that indicates the current ring id.


I've also had a problem getting the current sequence number through
libcman/cman_get_cluster()/ci_generation ---

On node 2 I see:

in cman_dispatch statechange callback:
  call cman_get_cluster(), get generation 2124
  call cman_get_nodes(), see node 1 removed

in cman_dispatch statechange callback:
  call cman_get_cluster(), get generation 2128
  call cman_get_nodes(), see node 1 added

in cman_dispatch statechange callback:
  call cman_get_cluster(), get generation 2128 (expect 2132)
  call cman_get_nodes(), see node 1 removed

in cman_dispatch statechange callback:
  call cman_get_cluster(), get generation 2136
  call cman_get_nodes(), see node 1 added

The second time node 1 is removed I get the previous generation when
node 1 was added instead of generation 2132 which the callback is for.

On node 4 I do get generation 2132 in that callback as expected.  So it
seems like it could be a race, I've only gone through this test once.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG callback with totem ringid + members

2010-03-02 Thread David Teigland
On Tue, Mar 02, 2010 at 11:10:49AM +0100, Jan Friesse wrote:
 I'll give you example.
 Let's say, you have 3 nodes (a,b,c). B,C are joined in group EXAMPLE.
 Now, A will fall ... you will not get normal confchg, because A was not
 in group. Now on B, you will run new cpg process joined to group. If you
 will call cpg_ringid_get, you will get different result, then before A fall.
 
 So, main question is, WHEN ringid should change? From my point of view
 (because CPG is lightweight membership), it should change when group
 change. But group change doesn't mean totem membership change (both case
 can really happen. Group change without Totem membership change and
 Totem membership change without group change). Of course, if we rely on
 group change, totem ringid really doesn't make sense no longer. If we
 rely only on Totem membership change, we need something like I
 implemented in cpg_totem_confchg.

The existing totem ring id already has a defined behavior, and I wasn't
expecting anything beyond that.  i.e. cpg_ringid_get would not tell us
anything new about cpg memb changes, only abut totem changes.  So, when a
totem memb change caused a cpg memb change, then it would be useful.
But, it's not really necessary to have this with your other patch, so
let's just leave out the cpg_ringid_get.

  I've started working on the code to use this, and it might be nice if the
  parameters matched the normal confchg parameters as closely as possible,
  i.e. include cpg_name, and use cpg_address instead of uint32_t.
 
 I was thinking about that, but:
 - cpg_name of what? We are talking about Totem membership change. Totem
 doesn't know anything about groups. If you want group_name of
 pid/nodeid/group unique triple, it can be implemented, but ... can you
 feel it doesn't fit very well?

OK, it's probably not necessary.  Is there a totem confchg callback per
handle then?  And I still get all normal cpg confchg callbacks before the
totem_confchg callback?

 Only thing from that structure which is used in Totem membership change
 is nodeid (what we are returning currently. True is that
 member_list_entries is really not good name and should be something like
 node_list_entries).
 
 - What should be filled in pid? Totem doesn't know about client pids.

Ah, right, I'd not considered that.  It's probably better to keep them
nodeids then.

 - Very similar is reason field. I can imagine to return only
 CPG_REASON_NODEDOWN and/or CPG_REASON_NODEUP.

I don't expect I'll need reason.

Thanks,
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG callback with totem ringid + members

2010-02-26 Thread David Teigland
On Mon, Feb 22, 2010 at 06:00:21PM +0100, Jan Friesse wrote:
 +struct cpg_ring_id {
 + uint32_t nodeid;
 + uint64_t seq;
 +};

What do you think about combining this patch with the other patch that
adds cpg_ringid_get()?  It's troublesome to combine the two patches to
test.

 +typedef void (*cpg_totem_confchg_fn_t) (
 + cpg_handle_t handle,
 + struct cpg_ring_id ring_id,
 + uint32_t member_list_entries,
 + const uint32_t *member_list);

I've started working on the code to use this, and it might be nice if the
parameters matched the normal confchg parameters as closely as possible,
i.e. include cpg_name, and use cpg_address instead of uint32_t.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync - CPG callback with totem ringid + members

2010-02-22 Thread David Teigland
On Mon, Feb 22, 2010 at 06:00:21PM +0100, Jan Friesse wrote:
 Related to https://bugzilla.redhat.com/show_bug.cgi?id=529424
 
 Patch implements new callback with current totem ring id and members.
 Included is modified testcpg using functionality. As required, callback
 is delivered AFTER all normal confchg callbacks.
 
 Patch is not 100% tested (specially big endian issues and
 whitetank/older versions of corosync coexistence) but looks stable.
 
 David, fulfill that functionality your requirements?

Thanks Honza!  This looks like it will do well, but I can't be certain
until I work through the implementation and testing to use this on my end.
I'll try to get working on that soon so I can let you know.
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] does self-fencing makes sense?

2010-02-22 Thread David Teigland
On Fri, Feb 19, 2010 at 03:31:10PM -0700, Steven Dake wrote:
 There are millions of lines of C code involved in directing a power
 fencing device to fence a node.  Generally in this case, the system
 directing the fencing is operating from a known good state.
 
 There are several hundred lines of C code that trigger a reboot when a
 watchdog timer isn't fed.  Generally in this case, the system directing
 the fencing (itself) has entered an undefined failure state.
 
 So a quick matrix:
 modelLOC   operating environment  
 power fencingmillions  well-defined
 self fencing hundreds  undefined

I completely agree with you that less code is more trustworthy than more
in general.  But your thesis seems to be based entirely on the hundreds
vs millions difference which I simply don't see.  Anyone can configure a
watchdog to replace power fencing today, it's simple, and there will be
negligible difference in the amount of code that's involved.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [QUORUM] This node is within the primary component and will provide service.

2010-02-19 Thread David Teigland
The corosync logs are so full of these messages that they end up being
unhelpful.  I think they could be made very helpful, though, if they were
printed when the quorum state changed.

Dave

Index: exec/vsf_quorum.c
===
--- exec/vsf_quorum.c   (revision 2662)
+++ exec/vsf_quorum.c   (working copy)
@@ -135,11 +135,12 @@
  size_t view_list_entries,
  int quorum, struct memb_ring_id *ring_id)
 {
+   int old_quorum = primary_designated;
primary_designated = quorum;
 
-   if (primary_designated) {
+   if (primary_designated  !old_quorum) {
log_printf (LOGSYS_LEVEL_NOTICE, This node is within the 
primary component and will provide service.\n);
-   } else {
+   } else if (!primary_designated  old_quorum) {
log_printf (LOGSYS_LEVEL_NOTICE, This node is within the 
non-primary component and will NOT provide any services.\n);
}
 
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] corosync-objctl **binary**

2010-01-13 Thread David Teigland
On Wed, Jan 13, 2010 at 02:49:53PM +1100, Angus Salkeld wrote:
 On Wed, Jan 13, 2010 at 6:06 AM, David Teigland teigl...@redhat.com wrote:
  corosync-objctl used to print a lot of useful information which now
  appears only as **binary**. ?Is there a way to get that back?
  Perhaps two output modes, one where it prints binary values in hex and
  another where it makes a best effort to interpret and print the values
  in a useful form?
 
  Dave
 
 Hi David
 
 The keys are now typed, the default as used by the old API
 defaults to ANY (or void*). So if we have uses of the old API
 then these objects are printed out as **BINARY**. If they are in actual
 fact strings then we need to update the call to key_create()
 to use the new API, which alows us to pass in the type (in this case
 STRING).

I wonder if there's anything preventing us from using the new API in the
cluster.git code?

 Can you send me the output of objctl.
 
 I just want to see which objects are still not created correctly.

cluster.name=**binary**(9)
cluster.config_version=**binary**(9)
cluster.totem.token=**binary**(9)
cluster.clusternodes.clusternode.name=**binary**(9)
cluster.clusternodes.clusternode.nodeid=**binary**(9)
cluster.clusternodes.clusternode.fence.method.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.name=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.action=**binary**(9)
cluster.clusternodes.clusternode.name=**binary**(9)
cluster.clusternodes.clusternode.nodeid=**binary**(9)
cluster.clusternodes.clusternode.fence.method.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.name=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.action=**binary**(9)
cluster.clusternodes.clusternode.name=**binary**(9)
cluster.clusternodes.clusternode.nodeid=**binary**(9)
cluster.clusternodes.clusternode.fence.method.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.name=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.action=**binary**(9)
cluster.clusternodes.clusternode.name=**binary**(9)
cluster.clusternodes.clusternode.nodeid=**binary**(9)
cluster.clusternodes.clusternode.fence.method.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.name=**binary**(9)
cluster.clusternodes.clusternode.fence.method.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.name=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.port=**binary**(9)
cluster.clusternodes.clusternode.unfence.device.action=**binary**(9)
cluster.fencedevices.fencedevice.name=**binary**(9)
cluster.fencedevices.fencedevice.agent=**binary**(9)
cluster.fencedevices.fencedevice.ipaddr=**binary**(9)
cluster.fencedevices.fencedevice.name=**binary**(9)
cluster.fencedevices.fencedevice.agent=**binary**(9)
cluster.fencedevices.fencedevice.ipaddr=**binary**(9)
cluster.cman.nodename=**binary**(9)
cluster.cman.cluster_id=**binary**(9)
totem.token=**binary**(9)
totem.version=**binary**(9)
totem.nodeid=**binary**(9)
totem.vsftype=**binary**(9)
totem.token_retransmits_before_loss_const=**binary**(9)
totem.join=**binary**(9)
totem.consensus=**binary**(9)
totem.rrp_mode=**binary**(9)
totem.secauth=**binary**(9)
totem.key=**binary**(9)
totem.interface.ringnumber=**binary**(9)
totem.interface.bindnetaddr=**binary**(9)
totem.interface.mcastaddr=**binary**(9)
totem.interface.mcastport=**binary**(9)
libccs.next_handle=**binary**(9)
libccs.connection.ccs_handle=**binary**(9)
libccs.connection.config_version=**binary**(9)
libccs.connection.fullxpath=**binary**(9)
libccs.connection.ccs_handle=**binary**(9)
libccs.connection.config_version=**binary**(9)
libccs.connection.fullxpath=**binary**(9)
libccs.connection.ccs_handle=**binary**(9)
libccs.connection.config_version=**binary**(9)
libccs.connection.fullxpath=**binary**(9)
logging.timestamp=**binary**(9)
logging.to_logfile=**binary**(9)
logging.logfile=**binary**(9)
logging.logfile_priority=**binary**(9)
logging.to_syslog=**binary**(9)
logging.syslog_facility=**binary**(9)
logging.syslog_priority=**binary**(9)
aisexec.user=**binary**(9)
aisexec.group=**binary**(9)
service.name=**binary**(9)
service.ver=**binary**(9)
service.name=**binary**(9)
service.ver=**binary**(9)
quorum.provider=**binary**(9)
service.name=**binary**(9)
service.ver=**binary**(9)
service.name=**binary**(9)
service.ver=**binary**(9

[Openais] corosync-objctl **binary**

2010-01-12 Thread David Teigland
corosync-objctl used to print a lot of useful information which now
appears only as **binary**.  Is there a way to get that back?
Perhaps two output modes, one where it prints binary values in hex and
another where it makes a best effort to interpret and print the values
in a useful form?

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] [PATCH] corosync/trunk QUORUM log message

2009-12-03 Thread David Teigland
This puts multiple nodeids on each [QUORUM] Members line instead of
putting each nodeid on a separate line.  With more than a few nodes the
excessive lines become a real nuisance, and anyone up around 32 nodes
may literally be scrolling through hundreds of those lines.

Index: vsf_quorum.c
===
--- vsf_quorum.c(revision 2562)
+++ vsf_quorum.c(working copy)
@@ -103,7 +103,35 @@
 static size_t quorum_view_list_entries = 0;
 static int quorum_view_list[PROCESSOR_COUNT_MAX];
 struct quorum_services_api_ver1 *quorum_iface = NULL;
+static char view_buf[64];
 
+static void log_view_list(const unsigned int *view_list, size_t 
view_list_entries)
+{
+   int total = (int)view_list_entries;
+   int len, pos, ret, cnt;
+   int i = 0;
+
+   while (1) {
+   len = sizeof(view_buf);
+   pos = 0;
+   memset(view_buf, 0, len);
+   cnt = 0;
+
+   for (; i  total; i++) {
+   ret = snprintf(view_buf + pos, len - pos,  %d, 
view_list[i]);
+   if (ret = len - pos)
+   break;
+   pos += ret;
+   cnt++;
+   }
+   log_printf (LOGSYS_LEVEL_NOTICE, Members[%d]:%s%s,
+   total, view_buf, i  total ? \\ : );
+
+   if (i == total)
+   break;
+   }
+}
+
 /* Internal quorum API function */
 static void quorum_api_set_quorum(const unsigned int *view_list,
  size_t view_list_entries,
@@ -123,9 +151,7 @@
memcpy(quorum_ring_id, ring_id, sizeof (quorum_ring_id));
memcpy(quorum_view_list, view_list, sizeof(unsigned 
int)*view_list_entries);
 
-   log_printf (LOGSYS_LEVEL_NOTICE, Members[%d]: , 
(int)view_list_entries);
-   for (i=0; iview_list_entries; i++)
-   log_printf (LOGSYS_LEVEL_NOTICE, %d , view_list[i]);
+   log_view_list(view_list, view_list_entries);
 
/* Tell internal listeners */
send_internal_notification();
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] cherrypicking into flatiron discussion - post 1.1.0

2009-09-21 Thread David Teigland
On Mon, Sep 21, 2009 at 08:35:33AM -0700, Steven Dake wrote:
 4) flatiron to trail trunk with bug resolution
 
 It appears waiting months to cherrypick patches doesn't produce a high
 quality flatiron that people can use continuously.
 
 I'm open to suggestions.  One option is to set some time limit on which
 a bug fix patch will remain in trunk before being merged into flatiron.
 Time open to debate...  7-10 days?

This seems backward, since most testing and bug fixes will originate in
flatiron, and the bug fixes are highly relevant to the flatiron that people
are using and much less relevant to trunk where real use is low.  We're much
more concerned (and it's much more useful) that a bug fix soak in flatiron
as opposed to trunk.

I suggest all bug fixes go immediately into flatiron for testing/soaking.  You
cut this off a week+ prior to releasing from flatiron, of course.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] correlating events

2009-09-11 Thread David Teigland
On Thu, Sep 10, 2009 at 04:11:28PM -0700, Steven Dake wrote:
 IMO the proper way to do this is to ensure whatever ringid was delivered in
 a callback to the application is the current ring id returned by the api.
 This gets rid of any races you describe above.

I can't really think of any races that would concern me.

I described two different queries using one function, maybe it would be
clearer if I described them using two separate functions.

1. cpg_ringid_confchg_cb(id1)
   id1 is the ringid associated with the last cpg confchg callback delivered
   to the app via cpg_dispatch().  If I call cpg_ringid_confchg_cb() from
   within a callback, I will be able to know the ringid associated with
   each confchg.

Of course cpg confchgs (joins/leaves) can happen without a change in the
ringid.  And likewise, the ringid can change without any corresponding cpg
confchg.  Cman on the otherhand is always in step with each ringid change.
What I want my app to do is wait until it knows that cpg and cman are in sync
with each other:

1. If cpg has more recent events than cman, then wait for cman to catch up.
   (the cpg_ringid_confchg_cb call above will solve this one)
2. If cman has more recent events than cpg, then wait for cpg to catch up.
   (still looking for a way to do this one)

So the next function is trying to solve 2, and I figured using ringid's again
might be good.  What makes it tricky is that the most recent ringid returned
by cman may not cause a cpg confchg.  The last ringid returned by
cpg_ringid_confchg_cb() may be less than the cman ringid, and waiting for them
to match won't work.  When the cman ringid is greater than the cpg ringid, the
app doesn't know if it's because the cpg callbacks just haven't been delivered
yet, or because there are no cpg callbacks for that ringid.

Functions of various forms could tell us, though.  One possibility:

2. rv = cpg_ringid_done(ringid)
   (I'd pass in the ringid from cman)
   rv would be 0 if there are any undelivered confchgs to the app for the
   ringid provided
   rv would be 1 if all confchgs have been delivered to the app up to and
   including the ringid provided

Or, something like I mentioned in the previous mail where cpg returns the
latest ringid it has seen for which all confchgs (if any) have been delivered
to the app.

  Chrissie pointed out that libcman only returns the 64 bit ringid as uint32,
  but I doubt we'll see ringid's bigger than that even if we do I'm just
  comparing consecutive id's so the lower 32 bits should be fine.
  
 
 Once the ring id is greater then 32 bits, you would always be comparing 0.

I don't follow.

 Looks like cman needs this error corrected, along with the addition of the
 ring leader node id.

 A ring id is uniquely identified by the nodeid of the ring leader and
 the 64 bit value of the ringid.  Need both values in the comparison.

I'm mainly interested in an equal comparison of ringids, but it might be
convenient to know if one came after another.  Would the ringid sequence
number ever not increase and in what situations?

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] correlating events

2009-09-02 Thread David Teigland
On Mon, Aug 31, 2009 at 02:28:33PM -0700, Steven Dake wrote:
 On Mon, 2009-08-31 at 15:44 -0500, David Teigland wrote:
  Here are two related and troublesome problems that would be nice to fix,
  probably in future versions -- they probably can't be fixed maintaining
  existing apis and protocols (although adding new api's to help with them 
  might
  be nice if possible).
  
  1. correlating events from different services locally
  
  I get nodedown from both cman (or quorum service) and cpg.  I need to
  correlate them with each other.  When I get a cpg nodedown for node A, I 
  don't
  know which cman nodedown for A it refers to: one of multiple in the past or
  one in the future that cman hasn't reported yet.
  
 
 Correlation could be solved by addition of api to cman, cpg, and quorum
 to retrieve the globally unique ring id for the last configuration
 change delivered to the application.
 
 If you agree, we can work on the implementation for corosync 1.1.
 Adding this to CPG is trivial, not sure about other services.
 
 Our policies wrt x.y.z would not be violated with this change.
 
 As an example, the API for cpg might look like
 
 cpg_ringid_get (handle, ring_id);
 
 Then ring_id could be memcmp'ed in the application.
 
 This would retrieve the last ring id delivered to the application (not
 the current ring id known to the cpg service).

Turns out that libcman already has a call that returns the ring id, so all I
need now is the addition to cpg.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] correlating events

2009-08-31 Thread David Teigland
Here are two related and troublesome problems that would be nice to fix,
probably in future versions -- they probably can't be fixed maintaining
existing apis and protocols (although adding new api's to help with them might
be nice if possible).

1. correlating events from different services locally

I get nodedown from both cman (or quorum service) and cpg.  I need to
correlate them with each other.  When I get a cpg nodedown for node A, I don't
know which cman nodedown for A it refers to: one of multiple in the past or
one in the future that cman hasn't reported yet.

2. correlating events among nodes

Some kind of global event/generation id associated with each configuration
that nodes can use to refer to the same event.  (For extra credit, make this
useful to detect the first configuration of a cpg across the cluster.)

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] correlating events

2009-08-31 Thread David Teigland
On Mon, Aug 31, 2009 at 02:28:33PM -0700, Steven Dake wrote:
 On Mon, 2009-08-31 at 15:44 -0500, David Teigland wrote:
  Here are two related and troublesome problems that would be nice to fix,
  probably in future versions -- they probably can't be fixed maintaining
  existing apis and protocols (although adding new api's to help with them 
  might
  be nice if possible).
  
  1. correlating events from different services locally
  
  I get nodedown from both cman (or quorum service) and cpg.  I need to
  correlate them with each other.  When I get a cpg nodedown for node A, I 
  don't
  know which cman nodedown for A it refers to: one of multiple in the past or
  one in the future that cman hasn't reported yet.
  
 
 Correlation could be solved by addition of api to cman, cpg, and quorum
 to retrieve the globally unique ring id for the last configuration
 change delivered to the application.
 
 If you agree, we can work on the implementation for corosync 1.1.
 Adding this to CPG is trivial, not sure about other services.
 
 Our policies wrt x.y.z would not be violated with this change.
 
 As an example, the API for cpg might look like
 
 cpg_ringid_get (handle, ring_id);
 
 Then ring_id could be memcmp'ed in the application.
 
 This would retrieve the last ring id delivered to the application (not
 the current ring id known to the cpg service).

Nice, I think that should work well for what I need.  I'd probably call
ringid_get() within the callback itself to get the ringid for it.


  2. correlating events among nodes
  
  Some kind of global event/generation id associated with each configuration
  that nodes can use to refer to the same event.  (For extra credit, make this
  useful to detect the first configuration of a cpg across the cluster.)
  
 
 define event here, you mean configuration change event or message
 delivery event?

I'm thinking of confchg's.  The problem I'm currently working around is when
the app gets a series of cpg confchg's and then wants to communicate and refer
to one of the confchg's in particular.

Dave
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] - Allow only one connection per (node, pid, grp)

2009-07-20 Thread David Teigland
On Mon, Jul 20, 2009 at 10:03:36AM +0200, Jan Friesse wrote:
 Patch solves problem, when one process connect multiple times to one
 group by disallow this situation.
 
 Please see patch comment for more informations.
 
 David, do you agree, that this is how cpg should behave, or you would
 rather see to support multiple (node, pid, grp)? (for me, it really
 doesn't make any sense).

Returning an error seems pretty obvious, I can't imagine anyone thinks it
makes sense.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-02 Thread David Teigland
On Thu, Jul 02, 2009 at 11:09:26AM -0700, Steven Dake wrote:
 On Thu, 2009-07-02 at 09:27 -0500, David Teigland wrote:
  On Thu, Jul 02, 2009 at 01:15:18PM +0200, Jan Friesse wrote:
   David Teigland wrote:
On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote:
other nodes should immediately recognize it has
previously failed and process a complete failure for it.

i.e. the full equivalent to what apps (using any api's) would see if the
node had failed via normal token timeout.
  
   More or less agree, but does this patch fixed problem for you or not?
  
  I haven't tried the patch, but based on the description and a quick look at
  the patch, I don't think it helps.  Think more broadly about what's 
  happening
  here, don't focus on one particular effect.
  
  1. nodes 1,2,3,4: are cluster members
  2. nodes 1,2,3,4: are using services A,B,C,D
  3. node4: ifdown eth0, kill corosync
  4. node4: ifup eth0, start corosync
  5. node4: do not start/use any services
  6. nodes 1,2,3: never see node4 removed from membership
  7. nodes 1,2,3: services A,B,C,D never see node4 removed/fail
  
 
 Individual services have to protect against those sorts of restarts.
 The only other mechanism would be to break wire compatibility within
 Totem.  

I'm trying to define my specific problem for you; how/when/where you actually
fix it isn't my main concern at this point.

(I'd suggest starting with a real, proper fix, without regard to compatibility
restrictions.  We'll get that working well.  Then, investigate the options for
backporting the same behavior into stable versions.  Doing that without
breaking compat will often involve some imperfect hacks.)

 This patch resolves the cpg case which is what the original bug was filed
 against.

It may resolve a problem that you're defining, but it doesn't resolve the
problem I'm defining.  Would you like bz 506255 to represent your bug or mine?
If yours, then I'll open a new bz.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-01 Thread David Teigland
On Wed, Jul 01, 2009 at 06:21:14PM +0200, Jan Friesse wrote:
 Included patch should fix
 https://bugzilla.redhat.com/show_bug.cgi?id=506255 .
 
 David, I hope it will fix problem for you.
 
 It's based on simple idea of adding node startup timestamp at the end of
 cpg_join (and joinlist) calls. If timestamp is larger then old timestamp
 we know, node was restarted and we didn't notices - deliver leave event
 and then join event. If timestamp is same (or in special cases lower) -
 new cpg app joined - send only join event.
 
 Of course, patch isn't so simple. Cpg_join messages are always send as
 larger messages with timestamp (btw. timestamp is 64-bit value, because
 I expect l(o^64)ng life of corosync ;) ). On delivery, we test, if
 message is larger then standard message. If it is - we have ts - use it.
 
 Bigger problem was joinlist, because it's array, ... you will see in
 source. Solution is, to send special entry, with pid 0 (shouldn't ever
 happened to process, to have pid 0), and timestamp encoded in name
 (ugly, but looks like working).
 
 Please comment, if you can.

This isn't specifically a cpg bug/problem, it's a problem with
corosync/openais in general.  When a node joins the cluster before others have
recognized it failed, the other nodes should immediately recognize it has
previously failed and process a complete failure for it.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

2009-07-01 Thread David Teigland
On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote:
 other nodes should immediately recognize it has
 previously failed and process a complete failure for it.

i.e. the full equivalent to what apps (using any api's) would see if the node
had failed via normal token timeout.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] change startup notice to Corosync Cluster Engine

2009-06-23 Thread David Teigland
On Mon, Jun 22, 2009 at 10:48:18PM -0700, Steven Dake wrote:
 

While you're there, perhaps knock down the level of those messages so we don't
see it all in /var/log/messages every time?

Jun 22 14:58:12 bull-01 corosync[2343]:   [MAIN  ] Corosync Executive Service 
RELEASE 'trunk'
Jun 22 14:58:12 bull-01 corosync[2343]:   [MAIN  ] Copyright (C) 2002-2006 
MontaVista Software, Inc and contributors.
Jun 22 14:58:12 bull-01 corosync[2343]:   [MAIN  ] Copyright (C) 2006-2008 Red 
Hat, Inc.
Jun 22 14:58:12 bull-01 corosync[2343]:   [MAIN  ] Corosync Executive Service: 
started and ready to provide service.

Everyone else seems to get by with a single I started line.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] call for roadmap features for future releases

2009-06-22 Thread David Teigland
On Mon, Jun 22, 2009 at 09:26:06AM -0700, Steven Dake wrote:
 On Mon, 2009-06-22 at 10:59 -0500, David Teigland wrote:
  On Sat, Jun 20, 2009 at 11:51:40AM -0700, Steven Dake wrote:
   I invite all of our contributors to help define the X.Y roadmap of both
   corosync and openais.  Please submit your ideas on this list.  Some
   examples of suggested ideas have been things like converting to libtool.
   Also new service engine ideas are highly welcome.  Keep ideas within a 1
   month - 3 year timeframe.
   
   I intend to publish the roadmap with the release of Corosync and OpenAIS
   1.0.0.  Please submit your ideas by June 26, 2009 (friday).
  
  More apis/tools for querying/reporting internal state.
  
 
 So external (as in not part of the corosync binary) diagnostic tools?

Yes, like corosync-cfgtool, corosync-objctrl.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] cpgx stuck

2009-06-03 Thread David Teigland
On Wed, Jun 03, 2009 at 04:28:27PM -0500, David Teigland wrote:
 Running cpgx -d1 on four nodes, where -d1 causes the test to periodically
 kill and restart corosync.  When this kill/restart happens on one node, others
 are typically exiting/joining the cpg during at the same time.  The result is
 that cpgx stops receiving any cpg callbacks, and it just sits there forever.

More specifically, it appears that any cpg join gets stuck if the join occurs
during the failure/recovery period of another node that was killed.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync] [patch] - ckpt solution - Change of Makefile.am

2009-05-29 Thread David Teigland
On Wed, May 27, 2009 at 04:15:52PM +0200, Jan Friesse wrote:
 Hi,
 included is patch for Makefile.am of corosync, so coroipcc.o is no
 longer included in lib... directly, but rather *.so is a dependency, so
 ipc_hdb is no longer in multiple *.so and multiple times in binary what
 causes problem.
 
 Should solve https://bugzilla.redhat.com/show_bug.cgi?id=499918.
 
 David, can you please confirm, that solved your problem? Thanks.

Yes it does, thanks,
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] new test prog

2009-05-29 Thread David Teigland
I wrote a new program cpgx to test the virtual synchrony guarantees of
corosync and cpg,

http://fedorapeople.org/gitweb?p=teigland/public_git/dct-stuff.git;a=summary

It joins a cpg, then randomly sends messages, leaves or exits, and repeats.
This all creates a random sequence of messages and configuration changes
(events).  Everyone keeps a history of all events, and continually compares
their history against everyone else.  This event history is the replicated
state of the program, upon which all future state is based, and which needs to
be synced to a node when it joins (state transfer).  If any node sees a
different event sequence or content from another (violating VS), it should be
quickly detected and easy to see exactly what was wrong.

It's simple to run, just start cpgx on up to 8 nodes running corosync, one
instance per node; nodes must have nodeid's between 1 and 255.  If there's a
problem it will stop running with an ERROR message.

It only tries to prove VS behavior, but it incidentally tests other aspects of
corosync also, e.g. it quickly reproduces this recent regression:
https://lists.linux-foundation.org/pipermail/openais/2009-May/012138.html

With the non-default -d1 option it will include approximated node failures in
the random mix of events by periodically killing corosync and restarting it
with cman_tool.  (I may later use iptables to simulate more realistic node
failures.)  It's not default because it often causes corosync to hang;
apparently one of those incidental other bugs.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync trunk] fix confchg races in cpg

2009-05-28 Thread David Teigland
On Thu, May 21, 2009 at 07:36:28AM -0700, Steven Dake wrote:
 It is possible with 3+ nodes joining or leaving at same time for a
 configuration change to be delivered to the user which it is not meant
 for.
 
 This patch solves that problem.

ack, using this patch I can't reproduce the problem

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [corosync + openais] [patch] Dispatch return bad handle - proposed solution

2009-05-19 Thread David Teigland
On Tue, May 19, 2009 at 03:40:53PM +0200, Jan Friesse wrote:
 Hi,
 attached are proposed solution to *dispatch* functions, which returns
 CS_ERR_BAD_HANDLE (AIS_ERR_BAD_HANDLE (9)).
 
 David, can you please test them, and give results?

Thanks, I tried the corosync patch, and cpg_dispatch error 9 is gone.
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] fix delayed shutdown

2009-05-18 Thread David Teigland
On Mon, May 18, 2009 at 01:44:50PM +0100, Chrissie Caulfield wrote:
 Steven Dake wrote:
  I don't think this will be backwards compatible with whitetank.  IMO use
  the memb_join_message_send function as outlined.  If you can show it
  works with whitetank then looks good for commit.
  
 
 OK, here's a new patch that doesn't create a new message type. The
 reason I had that in before was due to another bug I hadn't spotted :S
 
 I've tested this against whitetank and it works fine.

Thanks, this works nicely.
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] saCkptSectionIterationNext() error

2009-05-07 Thread David Teigland
On Thu, May 07, 2009 at 12:46:33AM -0700, Steven Dake wrote:
 On Wed, 2009-05-06 at 16:26 -0500, David Teigland wrote:
  I think we may have lost something in transit between irc/email/svn,
  
  Mar 26 16:10:20 dct   confchg, node1 create ckpt, node2 open ckpt, node2
  read ckpt - fail
  
  Mar 26 16:10:46 dct   nodeid 1 creates the ckpt
  
  Mar 26 16:13:42 dct   saCkptCheckpointOpen() works,
  saCkptSectionIterationInitialize() works,
  then saCkptSectionIterationNext() fails
  
  Mar 26 16:30:34 sdake wow iteration fails straight up single node
  Mar 26 16:30:39 sdake that was working like 1 week ago or less
  Mar 26 16:52:30 sdake dct found problem
  Mar 26 16:52:32 sdake patch coming to list now
  
  This looks like the patch, but I don't see it in svn
  https://lists.linux-foundation.org/pipermail/openais/2009-March/011048.html
  
  And I'm still getting error 9 (BAD_HANDLE) from 
  saCkptSectionIterationNext().
  
  Dave
  
 
 That fix is in openais trunk and handle iteration works for me.  Are you
 using openais-0.96.tar.gz?

svn trunk,

[openais/trunk/services]% svn info ckpt.c
Path: ckpt.c
Name: ckpt.c
URL: svn+ssh://svn.fedorahosted.org/svn/openais/trunk/services/ckpt.c
Repository Root: svn+ssh://svn.fedorahosted.org/svn/openais
Repository UUID: fd59a12c-fef9-0310-b244-a6a79926bd2f
Revision: 1888
Node Kind: file
Schedule: normal
Last Changed Author: sdake
Last Changed Rev: 1862
Last Changed Date: 2009-04-25 20:48:50 -0500 (Sat, 25 Apr 2009)
Text Last Updated: 2009-04-27 14:21:47 -0500 (Mon, 27 Apr 2009)
Checksum: 674843c3c135e651655eed1beab88a1b

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] cpg_dispatch BAD_HANDLE

2009-05-07 Thread David Teigland
I recently started getting BAD_HANDLE errors from cpg_dispatch() when leaving
a cpg:

- cpg_leave()
- cpg_dispatch(handle, CPG_DISPATCH_ALL)
- dispatch executes a confchg for the leave
- dispatch returns 9

It doesn't break anything, but I'd like to avoid adding code to detect when I
should or shouldn't ignore BAD_HANDLE errors.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-05-06 Thread David Teigland
On Mon, Apr 13, 2009 at 02:17:00PM -0500, David Teigland wrote:
 On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote:
  On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote:
   0. configure token timeout to some long time that is longer than all the
  following steps take
   
   1. cluster members are nodeid's: 1,2,3,4
   
   2. cpg foo has the following members:
  nodeid 1, pid 10
  nodeid 2, pid 20
  nodeid 3, pid 30
  nodeid 4, pid 40
   
   3. nodeid 4: ifdown eth0, kill corosync, kill pid 40
  (optionally reboot this node now)
   
   4. nodeid 4: ifup eth0, start corosync
   
   5. members of cpg foo (1:10, 2:20, 3:30) all get a confchg
  showing that 4:40 is not a member
   
   6. nodeid 4: start process pid 41 that joins cpg foo
   
   7. members of cpg foo (1:10, 2:20, 3:30, 4:41) all get a confchg
  showing that 4:41 is a member
   
   (Steps 6 and 7 should work the same even if the process started in step 6
   has pid 40 instead of pid 41.)
 
  100% agree that is how it should work.  If it doesn't, we will fix it.
  The only thing that may be strange is if pid in step 6 is the same pid
  as 40.  Are you certain the test case which fails has a differing pid at
  step 6?
 
 If you fix step 5, then I suspect steps 6,7 will just work.  After the test
 failed at step 5 I didn't pay too much attention to 6,7... but I'm sure that
 the pid in step 6 was different (I didn't reboot the node).

It's not clear what the plan was for this, any recent related changes I should
try?
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-05-06 Thread David Teigland
On Wed, May 06, 2009 at 02:10:27PM -0700, Steven Dake wrote:
 On Wed, 2009-05-06 at 15:04 -0500, David Teigland wrote:
  On Mon, Apr 13, 2009 at 02:17:00PM -0500, David Teigland wrote:
   On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote:
On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote:
 0. configure token timeout to some long time that is longer than all 
 the
following steps take
 
 1. cluster members are nodeid's: 1,2,3,4
 
 2. cpg foo has the following members:
nodeid 1, pid 10
nodeid 2, pid 20
nodeid 3, pid 30
nodeid 4, pid 40
 
 3. nodeid 4: ifdown eth0, kill corosync, kill pid 40
(optionally reboot this node now)
 
 4. nodeid 4: ifup eth0, start corosync
 
 5. members of cpg foo (1:10, 2:20, 3:30) all get a confchg
showing that 4:40 is not a member
 
 6. nodeid 4: start process pid 41 that joins cpg foo
 
 7. members of cpg foo (1:10, 2:20, 3:30, 4:41) all get a confchg
showing that 4:41 is a member
 
 (Steps 6 and 7 should work the same even if the process started in 
 step 6
 has pid 40 instead of pid 41.)
   
100% agree that is how it should work.  If it doesn't, we will fix it.
The only thing that may be strange is if pid in step 6 is the same pid
as 40.  Are you certain the test case which fails has a differing pid at
step 6?
   
   If you fix step 5, then I suspect steps 6,7 will just work.  After the 
   test
   failed at step 5 I didn't pay too much attention to 6,7... but I'm sure 
   that
   the pid in step 6 was different (I didn't reboot the node).
  
  It's not clear what the plan was for this, any recent related changes I 
  should
  try?
  Dave
  
 
 I haven't tried corosync with this test case, but it should work now.
 Did you try latest corosync on this case?   If it still fails Jan can
 address before 1.0.

Just tried it, and I get the same behavior as before.
Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] saCkptSectionIterationNext() error

2009-05-06 Thread David Teigland
I think we may have lost something in transit between irc/email/svn,

Mar 26 16:10:20 dct   confchg, node1 create ckpt, node2 open ckpt, node2
read ckpt - fail

Mar 26 16:10:46 dct   nodeid 1 creates the ckpt

Mar 26 16:13:42 dct   saCkptCheckpointOpen() works,
saCkptSectionIterationInitialize() works,
then saCkptSectionIterationNext() fails

Mar 26 16:30:34 sdake wow iteration fails straight up single node
Mar 26 16:30:39 sdake that was working like 1 week ago or less
Mar 26 16:52:30 sdake dct found problem
Mar 26 16:52:32 sdake patch coming to list now

This looks like the patch, but I don't see it in svn
https://lists.linux-foundation.org/pipermail/openais/2009-March/011048.html

And I'm still getting error 9 (BAD_HANDLE) from saCkptSectionIterationNext().

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Partition Recovery and CPG

2009-04-30 Thread David Teigland
On Thu, Apr 16, 2009 at 12:29:27PM -0500, David Teigland wrote:
 VS guarantees that all cpg members will see the same sequence of messages
 and configuration changes, i.e. history of events.  If a cpg is partitioned,
 that immediately violates VS.  One part must be killed so that the remaining
 nodes will all agree on one version of history, thus maintaining VS.
 Partitioning can't be avoided, so an application must be able to deal with
 it and kill/stop one part (assuming the app depends on VS.)

 corosync might make this easier by not merging cpg's (or even whole
 clusters) that have been partitioned, but that raises other questions and
 I've been told that doing it would be next to impossible.

I've done more reading, and it's become clear why corosync works the way it
does, and shouldn't really be blamed.  Corosync implements the totem protocol
which is Extended Virtual Synchrony.  I never knew the difference between
Virtual Synchrony and Extended Virtual Synchrony.

It turns out that this partitioning/remerging behavior is exactly what makes
EVS different from VS.  EVS/totem assumes that an app wants to continue
running after being partitioned, so it extends some message ordering
guarantees among nodes in both partitions.  Messages sent just before the
partition may be delivered in both partitions, and the idea behind EVS is that
these messages will be delivered in the same order in the separate partitions
(even though other messages, and confchg's of course, will be different).

To get this message ordering between partitions without violating other rules,
EVS adds a second transitional configuration change.  So an app sees two
configuration changes, the first removing nodes, the second potentially adding
nodes.

It seems that most apps want the standard VS behavior, and there's some doubt
about whether the EVS behaviors would really be wanted or needed in real
applications.  So, our apps are left doing some extra work to reduce the EVS
behavior to something closer to traditional VS.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [Corosync] Patch - Decouple shutdown ordering from objdb position

2009-04-29 Thread David Teigland
On Wed, Apr 29, 2009 at 02:28:05PM +0200, Andrew Beekhof wrote:
 At the moment, startup and shutdown ordering is controlled by the  
 plugin's position in an objdb list.
 
 This is particularly problematic for cluster resource managers which  
 must be unloaded/stopped first.
 The reason for this is that they (or the resources they control) need  
 access to at least some of the other services provided by Corosync.
 
 Based on input from Steve, this patch resolves the shutdown side of  
 the equation and if its acceptable I'll work on the startup side of  
 things.

I wonder if the recent cfg shutdown api from chrissie would be relevant to
this?

It also brings up the question of whether corosync should have a program to
start/stop the corosync daemon?

  corosync_tool join to set a config method and start corosync

  corosync_tool leave to stop corosync if it's not being used by apps
  (would use the cfg api)

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [PATCH] corosync/trunk: add logging backward compatibility config layer

2009-04-21 Thread David Teigland
On Tue, Apr 21, 2009 at 07:43:04PM +0200, Fabio M. Di Nitto wrote:
 On Tue, 2009-04-21 at 08:51 -0500, Ryan O'Hara wrote:
  On Tue, Apr 21, 2009 at 06:06:25AM +0200, Fabio M. Di Nitto wrote:
   Hi guys,
   
   in order to match the new logging config spec, 2 logging config keywords
   had to be changed.
  
  Can you direct me to this spec?
 
 Check cluster-devel and openais mailing list. The logging directives
 have been discussed to death a few tons of times.. I don't have a URL to
 the archive handy.

The cluster.conf man page shows at least my own version of what we're aiming
for:

http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=blob;f=config/man/cluster.conf.5;h=d9e50d4e6b4569b78a6b4007f94da6f20001fa46;hb=refs/heads/STABLE3

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes? Reply-To:

2009-04-20 Thread David Teigland
On Fri, Apr 17, 2009 at 10:56:47PM -0700, Steven Dake wrote:
 On Sat, 2009-04-18 at 07:49 +0200, Dietmar Maurer wrote:
like a 'merge' function? Seems the algorithm for checkpoint recovery
always uses the state from the node with the lowest processor id?
   
   Yes that is right.
  
  So if I have the following cluster:
  
  Part1: node2 node3 node4
  Part2: node1
  
  Let assume Part1 is running for some time and has gathered some state in
  checkpoints. Part2 is just the newly started node1.
  
  So when node1 starts up the whole cluster uses the empty checkpoint from
  node1? (I  guess I am confused somehow).
 
 The checkpoint service will merge checkpoints from both partitions into
 one view because both node 1 and node2 send out their checkpoint state
 on a merge operation.

That doesn't make any sense, I can't believe that's how it works, the
resulting content would be complete nonsense.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 07:49:12AM +0200, Dietmar Maurer wrote:
   like a 'merge' function? Seems the algorithm for checkpoint recovery
   always uses the state from the node with the lowest processor id?
  
  Yes that is right.
 
 So if I have the following cluster:
 
 Part1: node2 node3 node4
 Part2: node1
 
 Let assume Part1 is running for some time and has gathered some state in
 checkpoints. Part2 is just the newly started node1.
 
 So when node1 starts up the whole cluster uses the empty checkpoint from
 node1? (I  guess I am confused somehow).

It is *not* as simple as node with the low nodeid.  It is node with the low
nodeid where the state exists.  When selecting the node to send state to
others, you obviously need to select among nodes that have the state :-)

In the dlm_controld example I mentioned earlier, the function called
set_plock_ckpt_node() picks the node that will save state in the ckpt:

list_for_each_entry(memb, cg-members, list) {
if (!(memb-start_flags  DLM_MFLG_HAVEPLOCK))
continue;

if (!low || memb-nodeid  low)
low = memb-nodeid;
}

Only nodes that have state will have the DLM_MFLG_HAVEPLOCK flag set; new
nodes just added by a confchg will not have that flag set.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Partition Recovery and CPG

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 09:37:26AM +0200, Dietmar Maurer wrote:
  Yes, forcing the losers to reset and start from scratch is a must, but we
  end up doing that a layer above corosync.  That means the losers often
  reappear again through corosync/cpg prior to being forced out. 
 
 Are you talking about an implementation bug, or a 'bubbling idiot' which
 simply joins/leaves many times?

It may make sense to allow cpg partitions and merges for apps that do not
require the VS guarantees from cpg.  It does not make sense for apps (like
mine) that rely on VS.

The cause of transient partitions/merges during normal operation is largely
unknown.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-20 Thread David Teigland
On Sat, Apr 18, 2009 at 03:55:57AM -0700, Steven Dake wrote:
 On Sat, 2009-04-18 at 12:47 +0200, Dietmar Maurer wrote:
  At least the SA Forum does not mention such strange behavior. Isn't that
  a serious bug?

Yes, I'd consider it a serious bug.

  Consider 2 Partitions with one checkpoint:
  
  Part1: CkptSections ABC
  Part2: CkptSections BCD
  
  After the merge, you have: CkptSections ABCD
  
  And even worse, section contains data from different partitions (old
  data mixed with new one)? And there is no notification that such things
  happens?

That ckpt behavior is nonsensical for most real applications I'd wager.
I'm going to have to go check whether my apps are protected from that.

 The SA Forum doesn't consider at all how to handle partitions in a
 network or at least not very suitably (up to designer of SA Forum
 services).  They assume that applications will be using the AMF, and
 rely on the AMF functionality to reboot partitioned nodes (fencing) so
 this condition doesn't occur.

They don't consider it presumably because *it doesn't make any sense*.

 The SA Forum services were not designed with partitioned networks in
 mind.  It is unfortunate, but it is what it is.  If an app needs true
 consistently without some form of fencing, the app designer has to take
 partitions into consideration when designing their applications.
 
 This is why I recommend using CPG for these types of environments
 because it provides better design control over exactly how data
 remerges.

If the SAF services don't specify what should happen when clusters with
divergent state are combined, then it probably means it should not happen, and
you should probably not allow the unspecified behavior instead of making
something up.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Partition Recovery and CPG

2009-04-16 Thread David Teigland
On Thu, Apr 16, 2009 at 12:38:19PM +0200, Dietmar Maurer wrote:
 Lest assume the cluster is partitioned:
 
 Part1: node1 node2 node3
 Part2: node4 node5
 
 After recovery, what join/leave messaged do I receive with a CPG:
 
 A.) JOIN: node4 node5
 
 or
 
 B.) JOIN: node1 node2 node3
 
 or anything else?

In practice I believe you'll see:

nodes 1-3 get a confchg with members=1,2,3,4,5 joined=4,5
nodes 4-5 get a confchg with members=1,2,3,4,5 joined=1,2,3

The issue of partitioning and merging has been a big issue over the years, and
is a very serious problem for any application requiring the properties of
virtual synchrony.

VS guarantees that all cpg members will see the same sequence of messages and
configuration changes, i.e. history of events.  If a cpg is partitioned, that
immediately violates VS.  One part must be killed so that the remaining nodes
will all agree on one version of history, thus maintaining VS.  Partitioning
can't be avoided, so an application must be able to deal with it and kill/stop
one part (assuming the app depends on VS.)

Once a partition exists, a merge back together doesn't change the fact that
the disagreement has already occured (at partition time) and that disagreement
can only be resolved (to maintain VS) by killing nodes that don't agree with
one version of the history.

My applications use quorum to block activity in minority partitions.  They
also exchange messages to detect merges of prior partitions, and then
kill/block nodes that *were* in a minority partition to maintain VS in the
majority.

(Note that a *single* node:process joining the cpg doesn't mean that it wasn't
partitioned by itself and is now merging.)

corosync might make this easier by not merging cpg's (or even whole clusters)
that have been partitioned, but that raises other questions and I've been told
that doing it would be next to impossible.

We have a lot of experience with these situations because of corosync's
tendency to form spurious, transient partitions where a partition is created
and then immediately merged again in fractions of a second.  This doesn't
happen much any more with small clusters, but it does when you get up toward
32 nodes.  This is the most significant item on the list of suggested
improvements I recently sent out.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] delayed shutdown

2009-04-15 Thread David Teigland
If I run 'cman_tool leave' on four nodes in parallel, node1 will leave right
away, but the other three nodes don't leave until the token timeout expires
for node1 causing a confchg for it, after which the other three all leave
right away.  This has only been annoying me recently, so I think it must have
been some recent change.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-14 Thread David Teigland
On Tue, Apr 14, 2009 at 02:05:10PM +0200, Dietmar Maurer wrote:
 So CPG provide a framework to implement distributed finite state
 machines (DFSM). But there is no standard way to get the initial state
 of the DFSM. Almost all applications need to get the initial state, so I
 wonder if it would make sense to provide a service which solves that
 problem (at least as a example).
 
 My current solution is:
 
   I introduce a CPG mode, which is either:
 
   DFSM_MODE_SYNC ... CPG is syncing state. only state synchronization
  messages allowed. Other messages are
 delayed/queued.
 
   DFSM_MODE_WORK ... STATE is synced accross all members - normal
 operation. Queued
  messages are delivered when we reach this state.
 
   When a new node joins, CPG immediately change mode to
   DFSM_MODE_SYNC. Then all members send their state.
 
   When a node received the states of all members, it computes the new
   state by merging all received states (dfsm_state_merge_fn), and
   finally switches mode to DFSM_MODE_WORK.
 
 Does that make sense?

Yes.  I'm not sure if a generic service for this would be used much or not...
maybe.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


[Openais] improvements and optimizations

2009-04-14 Thread David Teigland
From one lone, biased, user's point of view, optimized malloc and memcpy are
uninteresting -- message throughput isn't what I'm looking for.  Are there
others out there who see this as important?  I *would* be interested in seeing
improvements in the following areas:

. message latency, if that's even possible

. recovery speed, this seems to be getting worse, things often hang for up
  to many seconds when nodes join or leave these days

. stability with much shorter token timeouts, we currently use 10 seconds
  as default, and I know corosync should work well with something much
  shorter, it just needs testing/validation along with some diagnostic
  methods to figure out when you're using something too short

. stability with clusters up to 32 nodes, with diagnostic capabilities to
  immediately pinpoint the cause of a breakdown

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] improvements and optimizations

2009-04-14 Thread David Teigland
On Tue, Apr 14, 2009 at 01:18:14PM -0700, Steven Dake wrote:
  . message latency, if that's even possible
 
 Reducing the time a token is held reduces latency.  So memcpy and malloc
 specials does reduce latency.  I don't have measures of how much,
 however.

That would be interesting to measure along with throughput, it's much more
relevant for applications doing coordination or locking via messages.

  . recovery speed, this seems to be getting worse, things often hang for up
to many seconds when nodes join or leave these days
  
 With trunk you see 2 second lags?  I agree that the recovery engine
 needs work to allow background synchronization of data sets without
 blocking the entire cluster operation during the synchronization period.

I'll try to measure it, my impression has been that it can be much longer than
2 seconds, but I've not been paying close attention.

  . stability with much shorter token timeouts, we currently use 10 seconds
as default, and I know corosync should work well with something much
shorter, it just needs testing/validation along with some diagnostic
methods to figure out when you're using something too short
 
 yes with 16 nodes it should work with 100msec timeouts as long as the
 kernel doesn't take long locks.  I think all those bugs are fixed in
 dlm/gfs now however.  The 10 seconds in cluster 2 was to work around
 those essentially kernel lockups.  

There were a couple of spots in the kernel that were quickly fixed by calling
schedule, they were never a big problem.  IIRC the timeout was increased to 10
seconds because certain drivers or nics were doing resets which would stall
network i/o.  We were worried that that would be common in user environments,
but I doubt it.

 Perhaps we can get some QE effort around identifying a shorter default.

We need to choose something that works in our testing first.  I think we
should go ahead and change it to something like 2 seconds by default, and see
what happens.

  . stability with clusters up to 32 nodes, with diagnostic capabilities to
immediately pinpoint the cause of a breakdown
 
 This is a great idea, but the diags to pinpoint the cause are very
 difficult.  I don't have a clear picture of how they would be designed
 but we have kicked around some ideas.

Yes, this will require some careful analysis, both within corosync and the
networking layers... and extended access to 16-32 nodes.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-04-13 Thread David Teigland
On Thu, Apr 09, 2009 at 06:02:38PM -0700, Steven Dake wrote:
 The issue that Dave is talking about I believe is described in the
 following bugzilla:
 https://bugzilla.redhat.com/show_bug.cgi?id=489451

No, not at all.

 IMO you should get a leave event for any process that leaves the process
 group independent of how totem works underneath.  CPG should provide the
 guarantees you seek, and if it doesn't, it is defective.  

OK, good.  Here's what we expect:

0. configure token timeout to some long time that is longer than all the
   following steps take

1. cluster members are nodeid's: 1,2,3,4

2. cpg foo has the following members:
   nodeid 1, pid 10
   nodeid 2, pid 20
   nodeid 3, pid 30
   nodeid 4, pid 40

3. nodeid 4: ifdown eth0, kill corosync, kill pid 40
   (optionally reboot this node now)

4. nodeid 4: ifup eth0, start corosync

5. members of cpg foo (1:10, 2:20, 3:30) all get a confchg
   showing that 4:40 is not a member

6. nodeid 4: start process pid 41 that joins cpg foo

7. members of cpg foo (1:10, 2:20, 3:30, 4:41) all get a confchg
   showing that 4:41 is a member

(Steps 6 and 7 should work the same even if the process started in step 6 has
pid 40 instead of pid 41.)

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-04-13 Thread David Teigland
On Mon, Apr 13, 2009 at 12:10:33PM -0700, Steven Dake wrote:
 On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote:
  0. configure token timeout to some long time that is longer than all the
 following steps take
  
  1. cluster members are nodeid's: 1,2,3,4
  
  2. cpg foo has the following members:
 nodeid 1, pid 10
 nodeid 2, pid 20
 nodeid 3, pid 30
 nodeid 4, pid 40
  
  3. nodeid 4: ifdown eth0, kill corosync, kill pid 40
 (optionally reboot this node now)
  
  4. nodeid 4: ifup eth0, start corosync
  
  5. members of cpg foo (1:10, 2:20, 3:30) all get a confchg
 showing that 4:40 is not a member
  
  6. nodeid 4: start process pid 41 that joins cpg foo
  
  7. members of cpg foo (1:10, 2:20, 3:30, 4:41) all get a confchg
 showing that 4:41 is a member
  
  (Steps 6 and 7 should work the same even if the process started in step 6
  has pid 40 instead of pid 41.)

 100% agree that is how it should work.  If it doesn't, we will fix it.
 The only thing that may be strange is if pid in step 6 is the same pid
 as 40.  Are you certain the test case which fails has a differing pid at
 step 6?

If you fix step 5, then I suspect steps 6,7 will just work.  After the test
failed at step 5 I didn't pay too much attention to 6,7... but I'm sure that
the pid in step 6 was different (I didn't reboot the node).

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 01:50:18PM +0200, Andrew Beekhof wrote:
 For added fun, a node that restarts quickly enough (think a VM) won't
 even appear to have left (or rejoined) the cluster.
 At the next totem confchg event, It will simply just be there again
 with no indication that anything happened.
 
 At least this is true for the raw corosync/openais membership data,
 perhaps CPG can infer this some other way.

Cpg should not let a node go away and come back without notice.  In practice
I'd expect back to back confchg's: one showing it leave and another showing it
join.  As Chrissie mentioned earlier, cpg shouldn't show the same node both
leaving and joining in a single confchg.  In theory I think it would be
legitimate.

Consider a couple examples.
m: member list, j: joined list, l: left list

1. nodes A and B join at once
A gets confchg: m=A,B j=A,B l=
B gets confchg: m=A,B j=A,B l=

2. node C joins
A gets confchg: m=A,B,C j=C l=
B gets confchg: m=A,B,C j=C l=
C gets confchg: m=A,B,C j=C l=

3. node C leaves and quickly rejoins in a single confchg
A gets confchg: m=A,B,C j=C l=C
B gets confchg: m=A,B,C j=C l=C
C gets confchg: m=A,B,C j=C l=C

4. node D joins and quickly leaves (or fails) in a single confchg
A gets confchg: m=A,B,C j=D l=D
B gets confchg: m=A,B,C j=D l=D
C gets confchg: m=A,B,C j=D l=D
D gets confchg: m=A,B,C j=D l=D ?*

* if D does a quick join+leave it may expect to see this confchg showing it in
the joined list, the left list, and not in the member list.

Again, the examples in 3 and 4 are, I think, legitimate in theory.  In
practice it sounds like they won't occur.

If a quick leave+join is guaranteed to be visible through cpg, then it must be
possible to observe at the lower level from raw corosync data.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] howto distribute data accross all nodes?

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 09:00:08PM +0200, Dietmar Maurer wrote:
  If new, normal read/write messages to the replicated state continue while
  the new node is syncing the pre-existing state, the new node needs to save
  those operations to apply after it's synced.
 
 Ah, that probably works. But can lead to very high memory usage if traffic
 is high.

If that's a problem you could block normal activity during the sync period.

 Is somebody really using that? If so, is there some code available
 (for safe/replay)?

There is no general purpose code.  dlm_controld is an example of a program
doing something like this, http://git.fedorahosted.org/git/dlm.git

It uses cpg to replicate state of posix locks, uses checkpoints to sync
existing lock state to new nodes, and saves messages on a new node until it
has completed syncing (i.e. reading pre-existing state from the checkpoint.)

Dave
___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] detecting cpg joiners

2009-04-09 Thread David Teigland
On Thu, Apr 09, 2009 at 10:12:43PM +0200, Andrew Beekhof wrote:
 On Thu, Apr 9, 2009 at 20:49, Joel Becker joel.bec...@oracle.com wrote:
  On Thu, Apr 09, 2009 at 01:50:18PM +0200, Andrew Beekhof wrote:
  For added fun, a node that restarts quickly enough (think a VM) won't
  even appear to have left (or rejoined) the cluster.
  At the next totem confchg event, It will simply just be there again
  with no indication that anything happened.
 
  ? ? ? ?This had BETTER not happen.
 
 It does, I've seen it enough times that Pacemaker has code to deal with it.

I'd call that a serious flaw we need to get fixed.  I'll see if I can make it
happen here.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [CRASH] corosync crash under load

2009-03-17 Thread David Teigland
On Tue, Mar 17, 2009 at 02:18:58PM +, Chrissie Caulfield wrote:
 I had three GFS filesystems all mounted on 13 nodes. When I went to
 umount them I got the following crash on 5 nodes of the system:
 
 (gdb) bt
 
 #0  0x7f21baeb0f05 in raise () from /lib64/libc.so.6
 
 #1  0x7f21baeb2a73 in abort () from /lib64/libc.so.6
 
 #2  0x7f21baef0438 in __libc_message () from /lib64/libc.so.6
 
 #3  0x7f21baef5ec8 in malloc_printerr () from /lib64/libc.so.6
 
 #4  0x7f21baef8486 in free () from /lib64/libc.so.6
 
 #5  0x00dabdd2 in messages_free () at totemsrp.c:2233
 #6  message_handler_orf_token (instance=0x7f21b89bd010,
 msg=value optimized out, msg_len=value optimized out,
 endian_conversion_needed=value optimized out) at totemsrp.c:3296
 #7  0x00da4f44 in rrp_deliver_fn (context=0x1064180, msg=0x10824ac,
 msg_len=70) at totemrrp.c:1332
 #8  0x00da2fdf in net_deliver_fn (handle=value optimized out,
 fd=value optimized out, revents=value optimized out,
 data=value optimized out) at totemnet.c:687
 #9  0x00da0698 in poll_run (handle=7749363892505018368)
 at coropoll.c:409
 #10 0x00404617 in main (argc=value optimized out,
 argv=value optimized out) at main.c:687
 (gdb) frame 5
 #5  0x00dabdd2 in messages_free () at totemsrp.c:2233
 2233free
 (regular_message-iovec[j].iov_base);
 (gdb) p j
 $1 = 1
 (gdb) p regular_message-iovec[j].iov_base
 $2 = (void *) 0xde6ebd8b4c81a096
 (gdb)

Here's another similar one while mounting/unmounting:

Program terminated with signal 11, Segmentation fault.
#0  0x7fa0ccf34159 in do_proc_join (name=0x7fffd78743a0, pid=11227, 
nodeid=2, reason=1)
at cpg.c:740
740 if (pi-pid == pid  pi-nodeid == nodeid) {
(gdb) bt
#0  0x7fa0ccf34159 in do_proc_join (name=0x7fffd78743a0, pid=11227, 
nodeid=2, reason=1)
at cpg.c:740
#1  0x7fa0ccf34499 in message_handler_req_exec_cpg_procjoin 
(message=0x7fffd7874390,
nodeid=2) at cpg.c:818
#2  0x00404829 in deliver_fn (nodeid=2, iovec=0x7fffd7874560, iov_len=1,
endian_conversion_required=0) at main.c:433
#3  0x0031d3616e74 in app_deliver_fn (nodeid=2, iovec=0x7fffd7874540, 
iov_len=1,
endian_conversion_required=0) at totempg.c:456
#4  0x0031d3616b68 in totempg_deliver_fn (nodeid=2, iovec=0x82f978, 
iov_len=1,
endian_conversion_required=0) at totempg.c:600
#5  0x0031d3615e12 in totemmrp_deliver_fn (nodeid=2, iovec=0x82f978, 
iov_len=1,
endian_conversion_required=0) at totemmrp.c:98
#6  0x0031d3613b8f in messages_deliver_to_app (instance=0x7fa0ce7d5010, 
skip=0,
end_point=1467) at totemsrp.c:3599
#7  0x0031d3614064 in message_handler_mcast (instance=0x7fa0ce7d5010, 
msg=0x836ffc,
msg_len=281, endian_conversion_needed=0) at totemsrp.c:3730
#8  0x0031d3615c32 in main_deliver_fn (context=0x7fa0ce7d5010, 
msg=0x836ffc, msg_len=281)
at totemsrp.c:4173
#9  0x0031d3608ee8 in none_mcast_recv (rrp_instance=0x81c580, iface_no=0,
context=0x7fa0ce7d5010, msg=0x836ffc, msg_len=281) at totemrrp.c:495
#10 0x0031d360aa43 in rrp_deliver_fn (context=0x81ca40, msg=0x836ffc, 
msg_len=281)
at totemrrp.c:1343
#11 0x0031d3606faf in net_deliver_fn (handle=7749363892505018368, fd=7, 
revents=1,
data=0x836950) at totemnet.c:687
#12 0x0031d360533c in poll_run (handle=7749363892505018368) at 
coropoll.c:409
#13 0x00405058 in main (argc=2, argv=0x7fffd78770c8) at main.c:687

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] automake merged into corosync

2009-03-10 Thread David Teigland
On Tue, Mar 10, 2009 at 01:41:57AM -0700, Steven Dake wrote:
 ./autogen.sh
 ./configure
 make
 make install DESTDIR=/

Any chance that install could default to DESTDIR=/ ?

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland
On Thu, Oct 30, 2008 at 11:26:14PM -0700, Steven Dake wrote:
 There are two types of messages.  Those intended for users/admins and
 those intended for developers.
 
 Both of these message types should always be recorded *somewhere*.  The
 entire concept of LOG_LEVEL_DEBUG is dubious to me.  If you want to
 stick with that symanetic and definition that is fine, but really a
 LOG_LEVEL_DEBUG means this message is for the developer.  These
 messages should be recorded and stored when a process segfaults, aborts
 due to assertion, or at administrative request.  Since the frequency of
 these messages is high there is no other option for recording them since
 they must _always_ be recorded for the purposes of debugging a field
 failure.  Recording to disk or syslog has significant performance
 impact.
 
 The only solution for these types of messages is to record them into a
 flight recorder buffer which can be dumped:
 1) at segv
 2) at sigabt
 3) at administrative request
 
 This is a fundamental difference in how we have approached logging
 debugging messages in the past but will lead to the ability to ensure we
 _always_ have debug trace data available instead of telling the
 user/admin Go turn on debug and hope you can reproduce that error and
 btw since 10k messages are logged your disk will fill up with
 irrelevant debug messages and your system will perform like mud.
 
 Logging these in memory is the only solution that I see as suitable and
 in all cases they should be filtered from any output source such as
 stderr, file, or syslog.

There's a difference between high volume trace debug data stored in
memory, and low volume informational debug data that can be easily written
to a file.  Both kinds of data can be useful.

My programs are simple enough that low volume informational debug data is
enough for me to identify and fix a problem.  So, low volume informational
data is all I produce.  It can be useful to write this data to a file.

Your program is complex enough that high volume trace debug data is
usually needed for you to identify and fix a problem.  So, high volume
trace data is all you produce.  This is too much data to write to a file
(by the running program).

So, we're using DEBUG to refer to different things.  We need to define
two different levels (just for clarity in this discussion):
. DEBUGLO is low volume informational data like I use
. DEBUGHI is high volume trace data like you use

DEBUGHI messages wouldn't ever be logged to files by the program while
running.  DEBUGLO messages could be, though, if the user configured it.
So, circling back around, how should a user configure DEBUGLO messages to
appear in syslog or a logfile?   In particular, what would they enter in
the cluster.conf logging/ section?  My suggestion is:

  syslog_level=foo
  logfile_level=bar

where foo and bar are one of the standard priority names in syslog.h.
So, if a user wanted DEBUGLO messages to appear in daemon.log, they'd set

  logging/daemon/logfile_level=debug

and if they wanted DEBUGLO messages to appear in /var/log/messages,

  logging/daemon/syslog_level=debug

(Note that debug means DEBUGLO here because DEBUGHI messages are only
saved in memory, not to files by a running program.)

There's another separate question I have about corosync, and that's
whether you could identify some limited number of messages that would be
appropriate for DEBUGLO?  They would be used by non-experts to do some
rough debugging of problems, and by experts to narrow down a problem
before digging into the high volume trace data.  I'd suggest that a good
starting point for DEBUGLO would be the data that openais has historically
put in /var/log/messages.  Data that helps you quickly triage a problem
(or verify that things are happening correctly) without stepping through
all the trace data.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland
On Tue, Nov 04, 2008 at 02:58:47PM -0600, David Teigland wrote:
 the cluster.conf logging/ section?  My suggestion is:
 
   syslog_level=foo
   logfile_level=bar

FWIW, I'm not set on this if someone has a better suggestion.  I just want
something unambiguous.  debug=on has been shown to mean something
different to everyone.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] [RFC] simple blackbox

2008-10-09 Thread David Teigland
 Wow that is a complicated solution.  I though that simple and blackbox
 went well together. 

Completely agree, too complex.  The logging code I copy into all the
daemons I write is at the opposite end of the spectrum; I doubt it's
possible to be much simpler.  (I copy it everywhere because it's too short
and simple to bother with a lib.)

#define DUMP_SIZE (1024 * 1024)

extern char dump_buf[DUMP_SIZE];
extern int dump_point;
extern int dump_wrap;

extern char daemon_debug_buf[256];

void daemon_dump_save(void)
{
int len, i;

len = strlen(daemon_debug_buf);

for (i = 0; i  len; i++) {
dump_buf[dump_point++] = daemon_debug_buf[i];

if (dump_point == DUMP_SIZE) {
dump_point = 0;
dump_wrap = 1;
}
}
}

#define log_debug(fmt, args...) \
do { \
snprintf(daemon_debug_buf, 255, %ld  fmt \n, time(NULL), ##args); \
daemon_dump_save(); \
} while (0)


That's it, just over 20 lines.  I also have a function that will write
dump_buf over a unix socket so a command line program can see it while the
daemon is running (that's the only way I ever use it, actually).  This is
non-threaded, of course, and corosync will need something more complex,
but the point is you can keep it simple.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] Split brain when using EVS library

2008-09-09 Thread David Teigland
On Tue, Sep 09, 2008 at 12:27:34PM +0200, Arne Eriksson R wrote:
 Hi,
 We have a cluster with 6 processors using openais stable version 0.80.3.
 
 For some reason our cluster splits up into two rings.
 Scenario is:
 node1(n1) n2 n3 n4 n5 n6 are in the ring.
 
 Suddenly the ring splits into two rings:
 n1 n2 n3 got leave msg from n4 n5 n6
 n4 n5 n6 got leave msg from n1 n2 n3
 
 After a few milliseconds the two rings joins again:
 n1 n2 n3 got join msg from n4 n5 n6
 n4 n5 n6 got join msg from n1 n2 n3
 
 The two ring is joined to one ring again:
 node1(n1) n2 n3 n4 n5 n6 are in the ring.

We at RH have struggled a great deal with this exact feature for quite a
long time.  It's the biggest problem by far that we've had using openais.

 The question is if this is a normal scenario from EVS in the openais
 implementation?
 
 The problem is that the application needs to detect the difference
 between two kinds of joins: The normal join where the two rings/nodes
 join for the first time and the abnormal joins where a ring has split
 and re-joined (without any nodes being restarted). The first case
 typically requires only a sync of some nodes (bringing the history up to
 date). The second case requires a merger, i.e selection of a loosing
 side and the looser discarding the loosers history.

Our applications (cman, dlm, gfs, etc using libcpg) need to make this same
distinction:  a join from a clean state where aisexec was just started,
vs a join from a dirty state where the cluster experienced a transient
partition (i.e. nodes split into two clusters and then aisexec
automatically merged the two clusters back together again.)

We've had to add the ability for our applications to detect that this has
happened by sending messages containing the state of the app.  And it
makes things quite a bit more complicated than they should be.

Dave

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] logsys patch

2008-07-02 Thread David Teigland
On Tue, Jul 01, 2008 at 03:11:26PM -0700, Steven Dake wrote:
 Dave,
 
 Your patch looks reasonable but has a few issues which need to be
 addressed.

 It doesn't address the setting of logsys_subsys_id but defines it.  I
 want to avoid the situation where logsys_subsys_id is defined, but then
 not set.  What I suggest here is to set logsys_subsys_id to some known
 value (-1) and assert if that the subsystem id is that value within
 log_printf to help developers catch this scenario.  At the moment the
 current API enforces proper behavior (it wont link if the developer does
 the wrong thing).  With your patch it will link, but may not behave
 properly sending log messages to the wrong subsystem (0) instead of the
 subsystem desired by the developer.  This is why the macros are there
 (to set the subsystem id and define it).  Your patch addresses the
 removal of the definition to a generic location but doesn't address at
 all the setting of the subsystem id.

Good thought, done.

 The logsys_exit function doesn't need to be added.  Instead this is
 managed by logsys_atsegv and logsys_flush.  If you desire, you can keep
 logsys_exit and have it call logsys_flush which is the proper thing to
 do on exit.

OK, added flush to exit, and reset logsys_subsys_id back to -1.

 Please follow the coding style guidelines (ie: match the rest of the
 logsys code) so I don't have to rework your patch before commit.

OK, new patch attached.

Dave

Index: logsys.c
===
--- logsys.c(revision 1568)
+++ logsys.c(working copy)
@@ -632,3 +632,41 @@
 {
worker_thread_group_wait (log_thread_group);
 }
+
+int logsys_init (char *name, int mode, int facility, int priority, char *file)
+{
+   char *errstr;
+
+   logsys_subsys_id = 0;
+
+   strncpy (logsys_loggers[0].subsys, name,
+sizeof (logsys_loggers[0].subsys));
+   logsys_config_mode_set (mode);
+   logsys_config_facility_set (name, facility);
+   logsys_config_file_set (errstr, file);
+   _logsys_config_priority_set (0, priority);
+   if ((mode  LOG_MODE_BUFFER_BEFORE_CONFIG) == 0) {
+   _logsys_wthread_create ();
+   }
+   return (0);
+}
+
+int logsys_conf (char *name, int mode, int facility, int priority, char *file)
+{
+   char *errstr;
+
+   strncpy (logsys_loggers[0].subsys, name,
+   sizeof (logsys_loggers[0].subsys));
+   logsys_config_mode_set (mode);
+   logsys_config_facility_set (name, facility);
+   logsys_config_file_set (errstr, file);
+   _logsys_config_priority_set (0, priority);
+   return (0);
+}
+
+void logsys_exit (void)
+{
+   logsys_subsys_id = -1;
+   logsys_flush ();
+}
+
Index: logsys.h
===
--- logsys.h(revision 1568)
+++ logsys.h(working copy)
@@ -170,8 +170,9 @@
}   \
 }
 
+static unsigned int logsys_subsys_id __attribute__((unused)) = -1; \
+
 #define LOGSYS_DECLARE_NOSUBSYS(priority)  \
-static unsigned int logsys_subsys_id __attribute__((unused));  \
 __attribute__ ((constructor)) static void logsys_nosubsys_init (void)  \
 {  \
_logsys_nosubsys_set(); \
@@ -180,7 +181,6 @@
 }
 
 #define LOGSYS_DECLARE_SUBSYS(subsys,priority) \
-static unsigned int logsys_subsys_id __attribute__((unused));  \
 __attribute__ ((constructor)) static void logsys_subsys_init (void)\
 {  \
logsys_subsys_id =  \
@@ -188,6 +188,7 @@
 }
 
 #define log_printf(lvl, format, args...) do {  \
+   assert (logsys_subsys_id != -1);\
if ((lvl) = logsys_loggers[logsys_subsys_id].priority) {   \
_logsys_log_printf2 (__FILE__, __LINE__, lvl,   \
logsys_subsys_id, (format), ##args);\
@@ -195,6 +196,7 @@
 } while(0)
 
 #define dprintf(format, args...) do {  \
+   assert (logsys_subsys_id != -1);\
if (LOG_LEVEL_DEBUG = logsys_loggers[logsys_subsys_id].priority) { \
_logsys_log_printf2 (__FILE__, __LINE__, LOG_DEBUG, \
logsys_subsys_id, (format), ##args);\
@@ -202,6 +204,7 @@
 } while(0)
 
 #define ENTER_VOID() do {  \
+   assert (logsys_subsys_id != -1);\
if (LOG_LEVEL_DEBUG = logsys_loggers[logsys_subsys_id].priority) { \
_logsys_trace (__FILE__, __LINE__, LOGSYS_TAG_ENTER,\

Re: [Openais] new config system

2008-03-26 Thread David Teigland
On Wed, Mar 26, 2008 at 11:57:59AM -0400, Lon Hohberger wrote:
 On Wed, 2008-03-26 at 10:32 -0500, David Teigland wrote:
 
  
  [1] Just to be clear, the meta-configuration idea is where a variety of
  config files can be used to populate a central config-file-agnostic
  respository.  A single interface is used by all to read config data from
  the repository.  Even if we did this, I don't see what it would give us
  anything.  All our existing applications access data that's only specified
  in a single config file anyway, so interchangable back-end files would be
  an unused feature.
 
 True, it doesn't give _us_ much to be agnostic to what the config file
 format looks like.
 
 However, with different back-ends used to populate the single config
 repo at run-time, we then have the ability to not have config files at
 all (well, except the meta-config stuff).
 
 What I mean is: An administrator might like to store the cluster
 configuration in an inventory database which isn't local to the cluster
 itself (e.g. LDAP, or whatever).  This might not be a requirement now,
 but that was one of the points of having multiple config back-ends,
 IIRC.

That's what I had in mind with the other? arrow pointing up at
libcmanconf.  Multiple back-ends for libcmanconf is one thing (good,
simple); multiple back-ends for a meta-configuration database with a
meta-API is what I've become skeptical about.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais


Re: [Openais] new config system

2008-03-26 Thread David Teigland
On Wed, Mar 26, 2008 at 10:32:54AM -0500, David Teigland wrote:
 A while back I drew this diagram to show what we were aiming to design, in
 broad terms, for the next generation aisexec/cman config system:
 
   http://people.redhat.com/teigland/cman3.jpg
 
 I think perhaps that diagram attempts to do too much, and I've drawn
 another:
 
   http://people.redhat.com/teigland/cman3b.jpg
 
 The big problem I see with the first diagram is that it tries to use objdb
 to solve the meta-configuration problem [1].  That's a hard problem, I'm
 not sure objdb is the right place to solve it, I don't think we have
 enough information to solve it properly right now, and I don't see that we
 have a pressing need to solve it right now.  So, the second diagram steps
 back to what Fabio has already implemented, more or less.

There were quite a few things wrong in the cman3b diagram, so based on the
explanation from Chrissie and Fabio, here's another:

http://people.redhat.com/teigland/cman3c.jpg

(The assumes comments don't mean it would be impossible to use one lib
with a different config plugin, but that it wouldn't make sense to do so
in practice.)

 Lon pointed out another problem with the first diagram, and that's that we
 want to be able to read config values without openais running, and running
 properly.  That's one of the things we were trying to get away from with
 ccsd.

The cman3c diagram does not solve this problem, but it could by caching a
local copy of the config data to use when aisexec is not running.

___
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais