On 07/05/2011 07:26 AM, Vladislav Bogdanov wrote:
> Hi all,
> 
> Last days I see following messages in logs:
> [TOTEM ] Process pause detected for XXX ms, flushing membership messages.
> 
> After that ring is quickly re-established.
> DLM/clvmd notifies this and switches to kern_stop waiting for fencing to
> be done. Although what dlm_tool ls provides is really strange flags and
> members differ between nodes. I have dumps of what has been happening in
> dlm, and there are messages that fencing was done!
> 
> On the other hand, pacemaker does not notify anything so fencing is not
> done. This is rather strange, but for another list.
> 
> Can anybody please explain what exactly that message means and what is
> the correct reaction of upper services should be?
> Can it be solely caused by network problems?
> Can number of buffers in RX ring of ethernet card influence this (I did
> some tuning there some time ago)?
> 
> corosync 1.3.1, UDPU transport.
> pacemaker-1.1-devel
> dlm_controld.pcmk from 3.0.17
> clvmd 2.02.85
> clusterlib-3.1.1
> 

This indicates the kernel has paused scheduling (or corosync of corosync
or corosync has blocked for the time value printed in the message.
Corosync is non-blocking.

Are you running inside a VM?  Increasing token is probably a necessity
when running inside a VM on a heavily loaded host because kvm does not
schedule as fairly as bare metal.

Please provide feedback if this is bare metal or m.

Regards
-steve
> Best,
> Vladislav
> _______________________________________________
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to