On 07/05/2011 07:26 AM, Vladislav Bogdanov wrote: > Hi all, > > Last days I see following messages in logs: > [TOTEM ] Process pause detected for XXX ms, flushing membership messages. > > After that ring is quickly re-established. > DLM/clvmd notifies this and switches to kern_stop waiting for fencing to > be done. Although what dlm_tool ls provides is really strange flags and > members differ between nodes. I have dumps of what has been happening in > dlm, and there are messages that fencing was done! > > On the other hand, pacemaker does not notify anything so fencing is not > done. This is rather strange, but for another list. > > Can anybody please explain what exactly that message means and what is > the correct reaction of upper services should be? > Can it be solely caused by network problems? > Can number of buffers in RX ring of ethernet card influence this (I did > some tuning there some time ago)? > > corosync 1.3.1, UDPU transport. > pacemaker-1.1-devel > dlm_controld.pcmk from 3.0.17 > clvmd 2.02.85 > clusterlib-3.1.1 >
This indicates the kernel has paused scheduling (or corosync of corosync or corosync has blocked for the time value printed in the message. Corosync is non-blocking. Are you running inside a VM? Increasing token is probably a necessity when running inside a VM on a heavily loaded host because kvm does not schedule as fairly as bare metal. Please provide feedback if this is bare metal or m. Regards -steve > Best, > Vladislav > _______________________________________________ > Openais mailing list > Openais@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais