HI, Your configuration is straightforward, nothing out of the ordinary.
Make sure that when your other box comes up from offline, syslog-ng is started before corosync. Because it appears that when you kill all the process and restart by that time syslog-ng has started and everything comes up properly. Your resource will migrate back because there is no reason for it to to stick there i.e. resource-stickiness. You might want to look into how to get resource stickiness which may mean enhancing your config a little more than what you have now. Configuration manual explains it very nicely. There is a tool called ptest you can use it to get the scores which determines the stickiness for e.g. you can experiment with different resource-stickiness values and then do ptest -sL to look at the score. You will have to go a bit deeper than your vanilla config to understand and also read the manual. Thanks -Shravan O n Thu, Dec 23, 2010 at 6:12 PM, Daniel Bareiro <daniel-lis...@gmx.net> wrote: > On Wednesday, 22 December 2010 08:29:02 -0500, > Shravan Mishra wrote: > >> Hi, > > Hi, Shravan. > >> What's happening is that corosync is forking but the exec is not >> happening. > > And do you think that what is shown in the logs is consistent with what > is shown using ps? > >> I used to see this problem in my case when syslog-ng process was not >> running. >> >> Try checking that and starting it and then start corosync. > > Now I see that if I do a shutdown of the node that has the resource > (failover-ip), then this does not migrate to another node. By doing the > test I made sure Pacemaker + Corosync are functioning correctly on both > nodes before doing a shutdown of Atlantis. > > Before making a shutdown of Atlantis: > > ----------------------------------------------------------------------- > daedalus:~# crm_mon --one-shot > ============ > Last updated: Thu Dec 23 19:24:09 2010 > Stack: openais > Current DC: atlantis - partition with quorum > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > 2 Nodes configured, 2 expected votes > 1 Resources configured. > ============ > > Online: [ atlantis daedalus ] > > failover-ip (ocf::heartbeat:IPaddr): Started atlantis > ----------------------------------------------------------------------- > > After doing a shutdown of Atlantis: > > ----------------------------------------------------------------------- > daedalus:~# crm_mon --one-shot > ============ > Last updated: Thu Dec 23 19:25:44 2010 > Stack: openais > Current DC: daedalus - partition WITHOUT quorum > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > 2 Nodes configured, 2 expected votes > 1 Resources configured. > ============ > > Online: [ daedalus ] > OFFLINE: [ atlantis ] > ----------------------------------------------------------------------- > > Here I'm using a configuration like the one presented in the wiki [1]. > > I am also noting that after the Atlantis launch, corosync makes the fork > without exec (as we assume from what I showed in the previous mail) and > only now is when the resource migrates to Daedalus: > > ----------------------------------------------------------------------- > daedalus:~# crm_mon --one-shot > ============ > Last updated: Thu Dec 23 19:49:11 2010 > Stack: openais > Current DC: daedalus - partition with quorum > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > 2 Nodes configured, 2 expected votes > 1 Resources configured. > ============ > > Online: [ daedalus ] > OFFLINE: [ atlantis ] > > failover-ip (ocf::heartbeat:IPaddr): Started daedalus > ----------------------------------------------------------------------- > > > ----------------------------------------------------------------------- > atlantis:~# crm_mon --one-shot > > Connection to cluster failed: connection failed > ----------------------------------------------------------------------- > > I tried doing a "corosync stop", but the processes are not closed: > > atlantis:~# ps auxf > [...] > root 1564 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > root 1565 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > root 1566 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > root 1567 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > root 1568 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > root 1569 0.0 1.2 168144 3240 ? S 19:38 0:00 /usr/sbin/corosync > > > The only way I found to correctly start corosync is doing a "pkill -9 > corosync" and "corosync start": > > > atlantis:~# ps auxf > [...] > root 2120 0.2 1.9 134288 5060 ? Ssl 19:59 0:00 /usr/sbin/corosync > root 2128 0.0 4.5 76028 11600 ? SLs 19:59 0:00 \_ /usr/lib/heartbeat/stonithd > 105 2129 0.1 2.0 79104 5120 ? S 19:59 0:00 \_ /usr/lib/heartbeat/cib > root 2130 0.0 0.8 71580 2108 ? S 19:59 0:00 \_ /usr/lib/heartbeat/lrmd > 105 2131 0.0 1.3 79968 3340 ? S 19:59 0:00 \_ /usr/lib/heartbeat/attrd > 105 2132 0.0 1.1 80332 2892 ? S 19:59 0:00 \_ /usr/lib/heartbeat/pengine > 105 2133 0.0 1.4 86216 3764 ? S 19:59 0:00 \_ /usr/lib/heartbeat/crmd > > > After this, the resource automatically migrates back to Atlantis: > > ----------------------------------------------------------------------- > daedalus:~# crm_mon --one-shot > ============ > Last updated: Thu Dec 23 20:03:18 2010 > Stack: openais > Current DC: daedalus - partition with quorum > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b > 2 Nodes configured, 2 expected votes > 1 Resources configured. > ============ > > Online: [ atlantis daedalus ] > > failover-ip (ocf::heartbeat:IPaddr): Started atlantis > ----------------------------------------------------------------------- > > > Any idea how to fix this problem with Corosync? > > Why to do a shutdown of Atlantis the resource does not migrate to > Daedalus? > > > > Thanks for your reply. > > Regards, > Daniel > > [1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo > -- > Daniel Bareiro - GNU/Linux registered user #188.598 > Proudly running Debian GNU/Linux with uptime: > 17:52:45 up 71 days, 18:19, 10 users, load average: 0.00, 0.01, 0.03 > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAk0T11kACgkQZpa/GxTmHTejywCfdVBAfru12t1LL8kvDiSCYGpJ > c9YAnjlbFMF9NzFWKCsA1vkzdCfOCmJr > =7Gh3 > -----END PGP SIGNATURE----- > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker