Re: [Linux-HA] Resources get restarted when a node joins the cluster
On Fri, May 29, 2009 at 10:30 AM, Tobias Appel wrote: > Well, exactly what I expected happened! > I set the 2nd node to standby - it had no resources running. We stopped > Heartbeat on the 2nd node and did some maintenance. When we started > Heartbeat again it joined the cluster as Online-standby and guess what! > > The resources on node 01 were getting stopped and restarted by heartbeat! > > Now why the hell did heartbeat do this and how can I stop heartbeat from > doing this in the future? Attach a hb_report archive to a bugzilla entry so that the developers have a chance to fix it :-) I've seen this with clones where the PE isn't always smart enough to do the right thing, but never for groups. > > Another very weird thing was that it did not stop all the resources. > We have configured one resource group only, containing 6 resources in > the following order: > mount filesystem > virtual ip > afd > cups > nfs > mailto notification > > it stopped the mailto and tried to stop NFS which failed since NFS was > being in use, instead of going into an unmanage state, it just left it > running and started mailto again. > No error was shown in crm_mon and the cluster luckily for us kept on > running. But we did get 2 emails from mailto. > > Now why did Heartbeat behave like this? We even had a constraint in > place which forces the resource group on node 01 (score infinity). > > If anyone can bring any light on this matter please do. This is > essentiell for me. > > Regards, > Tobi > > > Andrew Beekhof wrote: >> On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: >>> Hi, >>> >>> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: >>> >>> 2-Node Cluster, all resources run one node - no location constraints >>> Now I restarted the "standby" node (which had no resources running but >>> was still active inside the cluster). >>> When it came back online and joined the cluster again 3 different >>> scenarios happened: >>> >>> 1. all resources failed over to the newly joined node >>> 2. all resources stay on the current node but get restarted! >> >> Usually 1 and 2 occur when services are started by the node when it >> boots up (ie. not by the cluster). >> The cluster then detects this, stops them everywhere and starts them >> on just one node. >> >> Cluster resources must never be started automatically by the node at boot >> time. >> >>> 3. nothing happens >>> >>> Now I don't know why 1. or 2. happen but I remember seeing a mail on the >>> mailing list from someone with a similiar problem. Is there any way to >>> make sure heartbeat does NOT touch the resources, especially not >>> restarting or re-locating them? >>> >>> Thanks in advance, >>> Tobi >>> ___ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> ___ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources get restarted when a node joins the cluster
Well, exactly what I expected happened! I set the 2nd node to standby - it had no resources running. We stopped Heartbeat on the 2nd node and did some maintenance. When we started Heartbeat again it joined the cluster as Online-standby and guess what! The resources on node 01 were getting stopped and restarted by heartbeat! Now why the hell did heartbeat do this and how can I stop heartbeat from doing this in the future? Another very weird thing was that it did not stop all the resources. We have configured one resource group only, containing 6 resources in the following order: mount filesystem virtual ip afd cups nfs mailto notification it stopped the mailto and tried to stop NFS which failed since NFS was being in use, instead of going into an unmanage state, it just left it running and started mailto again. No error was shown in crm_mon and the cluster luckily for us kept on running. But we did get 2 emails from mailto. Now why did Heartbeat behave like this? We even had a constraint in place which forces the resource group on node 01 (score infinity). If anyone can bring any light on this matter please do. This is essentiell for me. Regards, Tobi Andrew Beekhof wrote: > On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: >> Hi, >> >> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: >> >> 2-Node Cluster, all resources run one node - no location constraints >> Now I restarted the "standby" node (which had no resources running but >> was still active inside the cluster). >> When it came back online and joined the cluster again 3 different >> scenarios happened: >> >> 1. all resources failed over to the newly joined node >> 2. all resources stay on the current node but get restarted! > > Usually 1 and 2 occur when services are started by the node when it > boots up (ie. not by the cluster). > The cluster then detects this, stops them everywhere and starts them > on just one node. > > Cluster resources must never be started automatically by the node at boot > time. > >> 3. nothing happens >> >> Now I don't know why 1. or 2. happen but I remember seeing a mail on the >> mailing list from someone with a similiar problem. Is there any way to >> make sure heartbeat does NOT touch the resources, especially not >> restarting or re-locating them? >> >> Thanks in advance, >> Tobi >> ___ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources get restarted when a node joins the cluster
Andrew Beekhof wrote: > On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: > >> Hi, >> >> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: >> >> 2-Node Cluster, all resources run one node - no location constraints >> Now I restarted the "standby" node (which had no resources running but >> was still active inside the cluster). >> When it came back online and joined the cluster again 3 different >> scenarios happened: >> >> 1. all resources failed over to the newly joined node >> 2. all resources stay on the current node but get restarted! >> > > Usually 1 and 2 occur when services are started by the node when it > boots up (ie. not by the cluster). > The cluster then detects this, stops them everywhere and starts them > on just one node. > > Cluster resources must never be started automatically by the node at boot > time. > > I noticed the same behaviour. Once then the standby node is activated back again, the resources stay on the same node but get restarted. The standby server is not restarted at all and no services are started along with it. In my case the resources were Xen domains. Thanks, Jan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources get restarted when a node joins the cluster
Thanks Andrew, I'll double check that nothing gets started automatically. Wish me luck :) Andrew Beekhof wrote: > On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: >> Hi, >> >> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: >> >> 2-Node Cluster, all resources run one node - no location constraints >> Now I restarted the "standby" node (which had no resources running but >> was still active inside the cluster). >> When it came back online and joined the cluster again 3 different >> scenarios happened: >> >> 1. all resources failed over to the newly joined node >> 2. all resources stay on the current node but get restarted! > > Usually 1 and 2 occur when services are started by the node when it > boots up (ie. not by the cluster). > The cluster then detects this, stops them everywhere and starts them > on just one node. > > Cluster resources must never be started automatically by the node at boot > time. > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources get restarted when a node joins the cluster
On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: > Hi, > > In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: > > 2-Node Cluster, all resources run one node - no location constraints > Now I restarted the "standby" node (which had no resources running but > was still active inside the cluster). > When it came back online and joined the cluster again 3 different > scenarios happened: > > 1. all resources failed over to the newly joined node > 2. all resources stay on the current node but get restarted! Usually 1 and 2 occur when services are started by the node when it boots up (ie. not by the cluster). The cluster then detects this, stops them everywhere and starts them on just one node. Cluster resources must never be started automatically by the node at boot time. > 3. nothing happens > > Now I don't know why 1. or 2. happen but I remember seeing a mail on the > mailing list from someone with a similiar problem. Is there any way to > make sure heartbeat does NOT touch the resources, especially not > restarting or re-locating them? > > Thanks in advance, > Tobi > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Resources get restarted when a node joins the cluster
Hi, In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: 2-Node Cluster, all resources run one node - no location constraints Now I restarted the "standby" node (which had no resources running but was still active inside the cluster). When it came back online and joined the cluster again 3 different scenarios happened: 1. all resources failed over to the newly joined node 2. all resources stay on the current node but get restarted! 3. nothing happens Now I don't know why 1. or 2. happen but I remember seeing a mail on the mailing list from someone with a similiar problem. Is there any way to make sure heartbeat does NOT touch the resources, especially not restarting or re-locating them? Thanks in advance, Tobi ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems