On Fri, May 29, 2009 at 10:30 AM, Tobias Appel <tap...@eso.org> wrote:
> Well, exactly what I expected happened!
> I set the 2nd node to standby - it had no resources running. We stopped
> Heartbeat on the 2nd node and did some maintenance. When we started
> Heartbeat again it joined the cluster as Online-standby and guess what!
>
> The resources on node 01 were getting stopped and restarted by heartbeat!
>
> Now why the hell did heartbeat do this and how can I stop heartbeat from
> doing this in the future?

Attach a hb_report archive to a bugzilla entry so that the developers
have a chance to fix it :-)

I've seen this with clones where the PE isn't always smart enough to
do the right thing, but never for groups.

>
> Another very weird thing was that it did not stop all the resources.
> We have configured one resource group only, containing 6 resources in
> the following order:
> mount filesystem
> virtual ip
> afd
> cups
> nfs
> mailto notification
>
> it stopped the mailto and tried to stop NFS which failed since NFS was
> being in use, instead of going into an unmanage state, it just left it
> running and started mailto again.
> No error was shown in crm_mon and the cluster luckily for us kept on
> running. But we did get 2 emails from mailto.
>
> Now why did Heartbeat behave like this? We even had a constraint in
> place which forces the resource group on node 01 (score infinity).
>
> If anyone can bring any light on this matter please do. This is
> essentiell for me.
>
> Regards,
> Tobi
>
>
> Andrew Beekhof wrote:
>> On Tue, May 26, 2009 at 2:56 PM, Tobias Appel <tap...@eso.org> wrote:
>>> Hi,
>>>
>>> In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:
>>>
>>> 2-Node Cluster, all resources run one node - no location constraints
>>> Now I restarted the "standby" node (which had no resources running but
>>> was still active inside the cluster).
>>> When it came back online and joined the cluster again 3 different
>>> scenarios happened:
>>>
>>> 1. all resources failed over to the newly joined node
>>> 2. all resources stay on the current node but get restarted!
>>
>> Usually 1 and 2 occur when services are started by the node when it
>> boots up (ie. not by the cluster).
>> The cluster then detects this, stops them everywhere and starts them
>> on just one node.
>>
>> Cluster resources must never be started automatically by the node at boot 
>> time.
>>
>>> 3. nothing happens
>>>
>>> Now I don't know why 1. or 2. happen but I remember seeing a mail on the
>>> mailing list from someone with a similiar problem. Is there any way to
>>> make sure heartbeat does NOT touch the resources, especially not
>>> restarting or re-locating them?
>>>
>>> Thanks in advance,
>>> Tobi
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to