Hideo,

Hi All,

Our user constituted a cluster in corosync and Pacemaker in the next 
environment.
The cluster constituted it among guests.

* Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
* libqb 0.17.1
* corosync 2.3.4
* Pacemaker 1.1.12

The cluster worked well.
When a user stopped an active guest, the next log was output in standby guests 
repeatedly.

What exactly you mean by "active guest" and "standby guests"?


May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515870 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515920 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5515971 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516021 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516071 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516121 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516171 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516221 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516271 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516322 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5516372 ms, flushing membership messages.
(snip)
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5526172 ms, flushing membership messages.
May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to form 
a cluster because of an operating system or network fault. The most common 
cause of this message is that the local firewall is configured improperly.
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected 
for 5526222 ms, flushing membership messages.
(snip)


This is weird. Not because of enormous pause length but because corosync has a "scheduler pause" detector which warns before "Process pause detected ..." error is logged.

As a result, the standby guest failed in the construction of the independent 
cluster.

It is recorded in log as if a timer stopped for 91 minutes.
It is abnormal length for 91 minutes.

Did you see a similar problem?

Never


Possibly I think whether it is libqb or Kernel or some kind of problems.

What virtualization technology are you using? KVM?

* I suspect that the set of the timer failed in reset_pause_timeout().

You can try to put asserts into this function, but there is really not too much reasons why it should fail (ether malloc returns NULL or some nasty memory corruption).

Regards,
  Honza


Best Regards,
Hideo Yamauchi.


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to