On Wednesday, 22 December 2010 08:29:02 -0500, Shravan Mishra wrote: > Hi,
Hi, Shravan. > What's happening is that corosync is forking but the exec is not > happening. And do you think that what is shown in the logs is consistent with what is shown using ps? > I used to see this problem in my case when syslog-ng process was not > running. > > Try checking that and starting it and then start corosync. Now I see that if I do a shutdown of the node that has the resource (failover-ip), then this does not migrate to another node. By doing the test I made sure Pacemaker + Corosync are functioning correctly on both nodes before doing a shutdown of Atlantis. Before making a shutdown of Atlantis: ----------------------------------------------------------------------- daedalus:~# crm_mon --one-shot ============ Last updated: Thu Dec 23 19:24:09 2010 Stack: openais Current DC: atlantis - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ atlantis daedalus ] failover-ip (ocf::heartbeat:IPaddr): Started atlantis ----------------------------------------------------------------------- After doing a shutdown of Atlantis: ----------------------------------------------------------------------- daedalus:~# crm_mon --one-shot ============ Last updated: Thu Dec 23 19:25:44 2010 Stack: openais Current DC: daedalus - partition WITHOUT quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ daedalus ] OFFLINE: [ atlantis ] ----------------------------------------------------------------------- Here I'm using a configuration like the one presented in the wiki [1]. I am also noting that after the Atlantis launch, corosync makes the fork without exec (as we assume from what I showed in the previous mail) and only now is when the resource migrates to Daedalus: ----------------------------------------------------------------------- daedalus:~# crm_mon --one-shot ============ Last updated: Thu Dec 23 19:49:11 2010 Stack: openais Current DC: daedalus - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ daedalus ] OFFLINE: [ atlantis ] failover-ip (ocf::heartbeat:IPaddr): Started daedalus ----------------------------------------------------------------------- ----------------------------------------------------------------------- atlantis:~# crm_mon --one-shot Connection to cluster failed: connection failed ----------------------------------------------------------------------- I tried doing a "corosync stop", but the processes are not closed: atlantis:~# ps auxf [...] root 1564 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync root 1565 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync root 1566 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync root 1567 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync root 1568 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync root 1569 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync The only way I found to correctly start corosync is doing a "pkill -9 corosync" and "corosync start": atlantis:~# ps auxf [...] root 2120 0.2 1.9 134288 5060 ? Ssl 19:59 0:00 /usr/sbin/corosync root 2128 0.0 4.5 76028 11600 ? SLs 19:59 0:00 \_ /usr/lib/heartbeat/stonithd 105 2129 0.1 2.0 79104 5120 ? S 19:59 0:00 \_ /usr/lib/heartbeat/cib root 2130 0.0 0.8 71580 2108 ? S 19:59 0:00 \_ /usr/lib/heartbeat/lrmd 105 2131 0.0 1.3 79968 3340 ? S 19:59 0:00 \_ /usr/lib/heartbeat/attrd 105 2132 0.0 1.1 80332 2892 ? S 19:59 0:00 \_ /usr/lib/heartbeat/pengine 105 2133 0.0 1.4 86216 3764 ? S 19:59 0:00 \_ /usr/lib/heartbeat/crmd After this, the resource automatically migrates back to Atlantis: ----------------------------------------------------------------------- daedalus:~# crm_mon --one-shot ============ Last updated: Thu Dec 23 20:03:18 2010 Stack: openais Current DC: daedalus - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ atlantis daedalus ] failover-ip (ocf::heartbeat:IPaddr): Started atlantis ----------------------------------------------------------------------- Any idea how to fix this problem with Corosync? Why to do a shutdown of Atlantis the resource does not migrate to Daedalus? Thanks for your reply. Regards, Daniel [1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo -- Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 17:52:45 up 71 days, 18:19, 10 users, load average: 0.00, 0.01, 0.03
signature.asc
Description: Digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker