I wrote a new program cpgx to test the virtual synchrony guarantees of corosync and cpg,
http://fedorapeople.org/gitweb?p=teigland/public_git/dct-stuff.git;a=summary It joins a cpg, then randomly sends messages, leaves or exits, and repeats. This all creates a random sequence of messages and configuration changes (events). Everyone keeps a history of all events, and continually compares their history against everyone else. This event history is the replicated state of the program, upon which all future state is based, and which needs to be synced to a node when it joins (state transfer). If any node sees a different event sequence or content from another (violating VS), it should be quickly detected and easy to see exactly what was wrong. It's simple to run, just start cpgx on up to 8 nodes running corosync, one instance per node; nodes must have nodeid's between 1 and 255. If there's a problem it will stop running with an ERROR message. It only tries to prove VS behavior, but it incidentally tests other aspects of corosync also, e.g. it quickly reproduces this recent regression: https://lists.linux-foundation.org/pipermail/openais/2009-May/012138.html With the non-default -d1 option it will include approximated node failures in the random mix of events by periodically killing corosync and restarting it with cman_tool. (I may later use iptables to simulate more realistic node failures.) It's not default because it often causes corosync to hang; apparently one of those incidental other bugs. Dave _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais