On 2012-10-16T23:04:59, RaSca <ra...@miamammausalinux.org> wrote:

> Hi all,
> I hope that you can help me with this strange problem. I've got a nine
> node cluster which is configured with no-quorum-policy to stop.
> Two days ago I came across this error on one of the nodes:
> 
> Oct 14 00:00:38 kvm06 kernel: Uhhuh. NMI received for unknown reason a1
> on CPU 0.
> Oct 14 00:00:38 kvm06 kernel: You have some hardware problem, likely on
> the PCI bus.
> Oct 14 00:00:38 kvm06 kernel: Dazed and confused, but trying to continue
> Oct 14 00:00:43 kvm06 corosync[2027]:   [TOTEM ] A processor failed,
> forming new configuration.

Depending on what kind of problem this node has, it could be that it
erratically affects timing of network messages, or even sends garbage,
which has the potential to mess up the totem protocol pretty much.

What corosync version do you have?

And yes, this is impossible to diagnose without the full cluster logs
etc. A good candidate for bugzilla.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to