On Fri, May 29, 2009 at 10:30 AM, Tobias Appel <tap...@eso.org> wrote: > Well, exactly what I expected happened! > I set the 2nd node to standby - it had no resources running. We stopped > Heartbeat on the 2nd node and did some maintenance. When we started > Heartbeat again it joined the cluster as Online-standby and guess what! > > The resources on node 01 were getting stopped and restarted by heartbeat! > > Now why the hell did heartbeat do this and how can I stop heartbeat from > doing this in the future?
Attach a hb_report archive to a bugzilla entry so that the developers have a chance to fix it :-) I've seen this with clones where the PE isn't always smart enough to do the right thing, but never for groups. > > Another very weird thing was that it did not stop all the resources. > We have configured one resource group only, containing 6 resources in > the following order: > mount filesystem > virtual ip > afd > cups > nfs > mailto notification > > it stopped the mailto and tried to stop NFS which failed since NFS was > being in use, instead of going into an unmanage state, it just left it > running and started mailto again. > No error was shown in crm_mon and the cluster luckily for us kept on > running. But we did get 2 emails from mailto. > > Now why did Heartbeat behave like this? We even had a constraint in > place which forces the resource group on node 01 (score infinity). > > If anyone can bring any light on this matter please do. This is > essentiell for me. > > Regards, > Tobi > > > Andrew Beekhof wrote: >> On Tue, May 26, 2009 at 2:56 PM, Tobias Appel <tap...@eso.org> wrote: >>> Hi, >>> >>> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: >>> >>> 2-Node Cluster, all resources run one node - no location constraints >>> Now I restarted the "standby" node (which had no resources running but >>> was still active inside the cluster). >>> When it came back online and joined the cluster again 3 different >>> scenarios happened: >>> >>> 1. all resources failed over to the newly joined node >>> 2. all resources stay on the current node but get restarted! >> >> Usually 1 and 2 occur when services are started by the node when it >> boots up (ie. not by the cluster). >> The cluster then detects this, stops them everywhere and starts them >> on just one node. >> >> Cluster resources must never be started automatically by the node at boot >> time. >> >>> 3. nothing happens >>> >>> Now I don't know why 1. or 2. happen but I remember seeing a mail on the >>> mailing list from someone with a similiar problem. Is there any way to >>> make sure heartbeat does NOT touch the resources, especially not >>> restarting or re-locating them? >>> >>> Thanks in advance, >>> Tobi >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems