I wrote a new program cpgx to test the virtual synchrony guarantees of
corosync and cpg,

http://fedorapeople.org/gitweb?p=teigland/public_git/dct-stuff.git;a=summary

It joins a cpg, then randomly sends messages, leaves or exits, and repeats.
This all creates a random sequence of messages and configuration changes
(events).  Everyone keeps a history of all events, and continually compares
their history against everyone else.  This event history is the replicated
state of the program, upon which all future state is based, and which needs to
be synced to a node when it joins (state transfer).  If any node sees a
different event sequence or content from another (violating VS), it should be
quickly detected and easy to see exactly what was wrong.

It's simple to run, just start cpgx on up to 8 nodes running corosync, one
instance per node; nodes must have nodeid's between 1 and 255.  If there's a
problem it will stop running with an ERROR message.

It only tries to prove VS behavior, but it incidentally tests other aspects of
corosync also, e.g. it quickly reproduces this recent regression:
https://lists.linux-foundation.org/pipermail/openais/2009-May/012138.html

With the non-default -d1 option it will include approximated node failures in
the random mix of events by periodically killing corosync and restarting it
with cman_tool.  (I may later use iptables to simulate more realistic node
failures.)  It's not default because it often causes corosync to hang;
apparently one of those incidental other bugs.

Dave

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to