Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-29 Thread Tobias Appel
Thanks Andrew, I'll double check that nothing gets started automatically.

Wish me luck :)

Andrew Beekhof wrote:
 On Tue, May 26, 2009 at 2:56 PM, Tobias Appel tap...@eso.org wrote:
 Hi,

 In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:

 2-Node Cluster, all resources run one node - no location constraints
 Now I restarted the standby node (which had no resources running but
 was still active inside the cluster).
 When it came back online and joined the cluster again 3 different
 scenarios happened:

 1. all resources failed over to the newly joined node
 2. all resources stay on the current node but get restarted!
 
 Usually 1 and 2 occur when services are started by the node when it
 boots up (ie. not by the cluster).
 The cluster then detects this, stops them everywhere and starts them
 on just one node.
 
 Cluster resources must never be started automatically by the node at boot 
 time.
 


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-29 Thread Jan Kalcic
Andrew Beekhof wrote:
 On Tue, May 26, 2009 at 2:56 PM, Tobias Appel tap...@eso.org wrote:
   
 Hi,

 In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:

 2-Node Cluster, all resources run one node - no location constraints
 Now I restarted the standby node (which had no resources running but
 was still active inside the cluster).
 When it came back online and joined the cluster again 3 different
 scenarios happened:

 1. all resources failed over to the newly joined node
 2. all resources stay on the current node but get restarted!
 

 Usually 1 and 2 occur when services are started by the node when it
 boots up (ie. not by the cluster).
 The cluster then detects this, stops them everywhere and starts them
 on just one node.

 Cluster resources must never be started automatically by the node at boot 
 time.

   
I noticed the same behaviour. Once then the standby node is activated
back again, the resources stay on the same node but get restarted. The
standby server is not restarted at all and no services are started along
with it. In my case the resources were Xen domains.

Thanks,
Jan
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-29 Thread Tobias Appel
Well, exactly what I expected happened!
I set the 2nd node to standby - it had no resources running. We stopped 
Heartbeat on the 2nd node and did some maintenance. When we started 
Heartbeat again it joined the cluster as Online-standby and guess what!

The resources on node 01 were getting stopped and restarted by heartbeat!

Now why the hell did heartbeat do this and how can I stop heartbeat from 
doing this in the future?

Another very weird thing was that it did not stop all the resources.
We have configured one resource group only, containing 6 resources in 
the following order:
mount filesystem
virtual ip
afd
cups
nfs
mailto notification

it stopped the mailto and tried to stop NFS which failed since NFS was 
being in use, instead of going into an unmanage state, it just left it 
running and started mailto again.
No error was shown in crm_mon and the cluster luckily for us kept on 
running. But we did get 2 emails from mailto.

Now why did Heartbeat behave like this? We even had a constraint in 
place which forces the resource group on node 01 (score infinity).

If anyone can bring any light on this matter please do. This is 
essentiell for me.

Regards,
Tobi


Andrew Beekhof wrote:
 On Tue, May 26, 2009 at 2:56 PM, Tobias Appel tap...@eso.org wrote:
 Hi,

 In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:

 2-Node Cluster, all resources run one node - no location constraints
 Now I restarted the standby node (which had no resources running but
 was still active inside the cluster).
 When it came back online and joined the cluster again 3 different
 scenarios happened:

 1. all resources failed over to the newly joined node
 2. all resources stay on the current node but get restarted!
 
 Usually 1 and 2 occur when services are started by the node when it
 boots up (ie. not by the cluster).
 The cluster then detects this, stops them everywhere and starts them
 on just one node.
 
 Cluster resources must never be started automatically by the node at boot 
 time.
 
 3. nothing happens

 Now I don't know why 1. or 2. happen but I remember seeing a mail on the
 mailing list from someone with a similiar problem. Is there any way to
 make sure heartbeat does NOT touch the resources, especially not
 restarting or re-locating them?

 Thanks in advance,
 Tobi
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-29 Thread Andrew Beekhof
On Fri, May 29, 2009 at 10:30 AM, Tobias Appel tap...@eso.org wrote:
 Well, exactly what I expected happened!
 I set the 2nd node to standby - it had no resources running. We stopped
 Heartbeat on the 2nd node and did some maintenance. When we started
 Heartbeat again it joined the cluster as Online-standby and guess what!

 The resources on node 01 were getting stopped and restarted by heartbeat!

 Now why the hell did heartbeat do this and how can I stop heartbeat from
 doing this in the future?

Attach a hb_report archive to a bugzilla entry so that the developers
have a chance to fix it :-)

I've seen this with clones where the PE isn't always smart enough to
do the right thing, but never for groups.


 Another very weird thing was that it did not stop all the resources.
 We have configured one resource group only, containing 6 resources in
 the following order:
 mount filesystem
 virtual ip
 afd
 cups
 nfs
 mailto notification

 it stopped the mailto and tried to stop NFS which failed since NFS was
 being in use, instead of going into an unmanage state, it just left it
 running and started mailto again.
 No error was shown in crm_mon and the cluster luckily for us kept on
 running. But we did get 2 emails from mailto.

 Now why did Heartbeat behave like this? We even had a constraint in
 place which forces the resource group on node 01 (score infinity).

 If anyone can bring any light on this matter please do. This is
 essentiell for me.

 Regards,
 Tobi


 Andrew Beekhof wrote:
 On Tue, May 26, 2009 at 2:56 PM, Tobias Appel tap...@eso.org wrote:
 Hi,

 In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:

 2-Node Cluster, all resources run one node - no location constraints
 Now I restarted the standby node (which had no resources running but
 was still active inside the cluster).
 When it came back online and joined the cluster again 3 different
 scenarios happened:

 1. all resources failed over to the newly joined node
 2. all resources stay on the current node but get restarted!

 Usually 1 and 2 occur when services are started by the node when it
 boots up (ie. not by the cluster).
 The cluster then detects this, stops them everywhere and starts them
 on just one node.

 Cluster resources must never be started automatically by the node at boot 
 time.

 3. nothing happens

 Now I don't know why 1. or 2. happen but I remember seeing a mail on the
 mailing list from someone with a similiar problem. Is there any way to
 make sure heartbeat does NOT touch the resources, especially not
 restarting or re-locating them?

 Thanks in advance,
 Tobi
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-27 Thread Andrew Beekhof
On Tue, May 26, 2009 at 2:56 PM, Tobias Appel tap...@eso.org wrote:
 Hi,

 In the past sometimes the following happened on my Heartbeat 2.1.14 cluster:

 2-Node Cluster, all resources run one node - no location constraints
 Now I restarted the standby node (which had no resources running but
 was still active inside the cluster).
 When it came back online and joined the cluster again 3 different
 scenarios happened:

 1. all resources failed over to the newly joined node
 2. all resources stay on the current node but get restarted!

Usually 1 and 2 occur when services are started by the node when it
boots up (ie. not by the cluster).
The cluster then detects this, stops them everywhere and starts them
on just one node.

Cluster resources must never be started automatically by the node at boot time.

 3. nothing happens

 Now I don't know why 1. or 2. happen but I remember seeing a mail on the
 mailing list from someone with a similiar problem. Is there any way to
 make sure heartbeat does NOT touch the resources, especially not
 restarting or re-locating them?

 Thanks in advance,
 Tobi
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems