On Tue, Sep 09, 2008 at 12:27:34PM +0200, Arne Eriksson R wrote: > Hi, > We have a cluster with 6 processors using openais stable version 0.80.3. > > For some reason our cluster splits up into two rings. > Scenario is: > node1(n1) n2 n3 n4 n5 n6 are in the ring. > > Suddenly the ring splits into two rings: > n1 n2 n3 got leave msg from n4 n5 n6 > n4 n5 n6 got leave msg from n1 n2 n3 > > After a few milliseconds the two rings joins again: > n1 n2 n3 got join msg from n4 n5 n6 > n4 n5 n6 got join msg from n1 n2 n3 > > The two ring is joined to one ring again: > node1(n1) n2 n3 n4 n5 n6 are in the ring.
We at RH have struggled a great deal with this exact "feature" for quite a long time. It's the biggest problem by far that we've had using openais. > The question is if this is a normal scenario from EVS in the openais > implementation? > > The problem is that the application needs to detect the difference > between two kinds of joins: The "normal" join where the two rings/nodes > join for the first time and the "abnormal" joins where a ring has split > and re-joined (without any nodes being restarted). The first case > typically requires only a sync of some nodes (bringing the history up to > date). The second case requires a merger, i.e selection of a loosing > side and the looser discarding the loosers history. Our applications (cman, dlm, gfs, etc using libcpg) need to make this same distinction: a join from a "clean" state where aisexec was just started, vs a join from a "dirty" state where the cluster experienced a transient partition (i.e. nodes split into two clusters and then aisexec automatically merged the two clusters back together again.) We've had to add the ability for our applications to detect that this has happened by sending messages containing the state of the app. And it makes things quite a bit more complicated than they should be. Dave _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais