On 07/18/2011 10:38 AM, Jed Smith wrote: > Thank you for your reply. > > On Mon, Jul 18, 2011 at 1:18 PM, Digimer <li...@alteeve.com> wrote: >> Is it possible that the switch dropped the multicast group, and didn't >> reform it fast enough to prevent the cluster from partitioning? > > Our network guy says that the switches do not look at multicast > traffic, they merely broadcast it in our environment. >
unlikely. I expect what is happening is your switch is delaying multicast packets compared to the unicast token. This causes retransmits. There is a bug in older versions of our totem implementation that increase the fail to recv counter incorrectly. In newer versions we have worked around this flaw in the original totem specification (which expects multicast can be flushed before a token receipt, which is an invalid assertion). My recommendation to you is to update to a 1.3 or 1.4 series. Both of these have very tight maintenance rules around what goes in (ie: its not tip development work). Once you have a version that doesn't have known bugs, I'd recommend increasing fail recv const to some large value, such as 5000. See: http://www.mail-archive.com/openais@lists.linux-foundation.org/msg05924.html It would be nice if the debian maintainers would update their packages to latest upstream. We release z streams for a reason, usually the reason being someone has had a field failure resulting in a complete cluster outage). Y stream releases are a bit more liberal in terms of additional features. File a bug with your distro and ask them to use an upstream release which is recent and supported upstream (1.2.y upstream support fell off once we released 1.4.y - we support 2 y streams). Thanks -steve > Thanks, > _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais