Hi, I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated just yesterday to see if it fixed it, but no luck) and I'm seeing issues when I reboot a node. I tried increasing the post_join_delay to 60 and the totem token to 25000, but nothing seems to be working.
During the boot when the cman init script runs, I see openais messages on the current running node for anywhere between 15 to 30 seconds: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 0. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 560 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:20 lxomp83k openais[3602]: [CLM ] got nodejoin message 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [CPG ] got joinlist message from node 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 9. That repeats until I finally see this... May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 568 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep 151.117.65.61 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member 151.117.65.62: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep 151.117.65.62 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.62) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:27 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.62) May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k because it has rejoined the cluster with existing state At this point when the second node comes up, I can login and run service cman stop and service cman start. On that start the node joins the cluster immediately with no issue. [EMAIL PROTECTED] ~]# uname -a Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [EMAIL PROTECTED] ~]# rpm -q cman cman-2.0.84-2.el5 Any suggestions?? TIA, Jeremy
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster