On Mon, 2009-04-13 at 13:35 -0500, David Teigland wrote: > On Thu, Apr 09, 2009 at 06:02:38PM -0700, Steven Dake wrote: > > The issue that Dave is talking about I believe is described in the > > following bugzilla: > > https://bugzilla.redhat.com/show_bug.cgi?id=489451 > > No, not at all. > > > IMO you should get a leave event for any process that leaves the process > > group independent of how totem works underneath. CPG should provide the > > guarantees you seek, and if it doesn't, it is defective. > > OK, good. Here's what we expect: > > 0. configure token timeout to some long time that is longer than all the > following steps take > > 1. cluster members are nodeid's: 1,2,3,4 > > 2. cpg foo has the following members: > nodeid 1, pid 10 > nodeid 2, pid 20 > nodeid 3, pid 30 > nodeid 4, pid 40 > > 3. nodeid 4: ifdown eth0, kill corosync, kill pid 40 > (optionally reboot this node now) > > 4. nodeid 4: ifup eth0, start corosync > > 5. members of cpg foo (1:10, 2:20, 3:30) all get a confchg > showing that 4:40 is not a member > > 6. nodeid 4: start process pid 41 that joins cpg foo > > 7. members of cpg foo (1:10, 2:20, 3:30, 4:41) all get a confchg > showing that 4:41 is a member > > (Steps 6 and 7 should work the same even if the process started in step 6 has > pid 40 instead of pid 41.) > > Dave
100% agree that is how it should work. If it doesn't, we will fix it. The only thing that may be strange is if pid in step 6 is the same pid as 40. Are you certain the test case which fails has a differing pid at step 6? This points out a weakness in the current cpg protocol which could be addressed by adding a pid start time to the multicast message to uniquely identify node restarts with the same pid startup order. Unfortunately this would have to be done in some backward compatible fashion. Regards -steve > _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais