A tip for those who come this way after us: we found a large part of the problem was that the cluster nodes rely on being in constant communication.
If one of them is under high load (say, running some reports or something) its CPU usage may be so high it does not respond to the cluster ping quickly enough (within 3 seconds). The cluster then treats it as dead and removes it from the cluster, even though it is not dead it is just busy. We increased the org.jgroups.protocols.pbcast.GMS timeout and it helped a great deal. View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4259958#4259958 Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4259958 _______________________________________________ jboss-user mailing list jboss-user@lists.jboss.org https://lists.jboss.org/mailman/listinfo/jboss-user