Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Karthikeyan Ramasamy
Yes, Ken.  There were multiple instances of the external script also running.  
What do you think could possibly be wrong with the script that triggers the 
crm_mon process everytime?

We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  

Thanks,
Karthik.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: 02 நவம்பர் 2015 21:04
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
> Thanks, Mr.Gaillot.
> 
> Yes, we trigger the snmp notification with an external script.  From your 
> response, I understand that the issue wouldn't occur with 1.1.14, as it 
> wouldn't require the crm_mon process.  Is this understanding correct?

Correct, crm_mon is not spawned with the new method. However if the problem 
originates in the external script, then it still might not work properly, but 
with the new method Pacemaker will kill it after a timeout.

> We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
> ticket to RedHat, would they be able to provide us a patch for 1.1.10?

If you're using RHEL 6.7, you should be able to simply "yum update" to get 
1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
That would give you more bugfixes, which may or may not help your issue.
If you're using an older version, there may not be updates any longer.

If you open a ticket, support can help you isolate where the problem is.

When you saw many crm_mon processes running, did you also see many copies of 
the external script running?

> Many thanks for your response.
> 
> Thanks,
> Karthik.
> 
> 
>  Ken Gaillot எழுதியது 
> 
> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>> Dear Pacemaker support,
>> We are using pacemaker1.1.10-14 to implement a service management 
>> framework, with high availability on the road-map.  This pacemaker 
>> version was available through redhat for our environments
>>
>>   We are running into an issue where pacemaker causes a node to crash.  The 
>> last feature we integrated was SNMP notification.  While listing out the 
>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>> when the node crashed.  When we removed that feature, pacemaker was stable 
>> again.
>>
>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>> for list of known issues, we found this crm_mon memory leak issue.  Although 
>> not related, we think that there is some problem with the crm_mon process.
>>
>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>
>> Can you please let us know if there are issues with SNMP notification in 
>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>> workarounds for this issue if available, would be very helpful for us.  
>> Please help.
>>
>> Thanks,
>> Karthik.
> 
> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or 
> an external script that generates the SNMP trap?
> 
> If you're using the built-in capability, that has to be explicitly 
> enabled when Pacemaker is compiled. Many distributions (including 
> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
> option, you have it enabled, otherwise not.
> 
> If you're using an external script to generate the SNMP trap, please 
> post it (with any sensitive info taken out of course).
> 
> The ClusterMon resource will generate a crm_mon at regular intervals, 
> but it should exit quickly. It sounds like it's not exiting at all, 
> which is why you see this problem.
> 
> If you have a RHEL subscription, you can open a support ticket with 
> Red Hat. Note that stonith must be enabled before Red Hat (and many 
> other
> vendors) will support a cluster. Also, you should be able to "yum 
> update" to a much newer version of Pacemaker to get bugfixes, if 
> you're using RHEL 6 or 7.
> 
> FYI, the latest upstream Pacemaker has a new feature that will be in 
> 1.1.14, allowing it to call an external notification script without 
> needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Karthikeyan Ramasamy
Many thanks.  Will check and get back.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: 02 நவம்பர் 2015 21:21
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script doesn't 
exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes away, 
then it's in the script somewhere. Mostly you want to make sure the script 
completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com]
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, 
>> or an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat (and many 
>> other
>> vendors) will support a cluster. Also, you should be able to "yum 
>> update" to a much newer version of Pacemaker to get bugfixes, if 
>> you're using RHEL 6 or 7.
>>
>> FYI, the latest upstream Pacemaker has a new feature that will be in 
>> 1.1.14, allowing it to call an external notification script without 
>> needing a ClusterMon resource.
> 

___
Users 

Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script
doesn't exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes
away, then it's in the script somewhere. Mostly you want to make sure
the script completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or 
>> an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including 
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat (and many 
>> other
>> vendors) will support a cluster. Also, you should be able to "yum 
>> update" to a much newer version of Pacemaker to get bugfixes, if 
>> you're using RHEL 6 or 7.
>>
>> FYI, the latest upstream Pacemaker has a new feature that will be in 
>> 1.1.14, allowing it to call an external notification script without 
>> needing a ClusterMon resource.
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
> Thanks, Mr.Gaillot.
> 
> Yes, we trigger the snmp notification with an external script.  From your 
> response, I understand that the issue wouldn't occur with 1.1.14, as it 
> wouldn't require the crm_mon process.  Is this understanding correct?

Correct, crm_mon is not spawned with the new method. However if the
problem originates in the external script, then it still might not work
properly, but with the new method Pacemaker will kill it after a timeout.

> We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
> ticket to RedHat, would they be able to provide us a patch for 1.1.10?

If you're using RHEL 6.7, you should be able to simply "yum update" to
get 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
That would give you more bugfixes, which may or may not help your issue.
If you're using an older version, there may not be updates any longer.

If you open a ticket, support can help you isolate where the problem is.

When you saw many crm_mon processes running, did you also see many
copies of the external script running?

> Many thanks for your response.
> 
> Thanks,
> Karthik.
> 
> 
>  Ken Gaillot எழுதியது 
> 
> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>> Dear Pacemaker support,
>> We are using pacemaker1.1.10-14 to implement a service management framework, 
>> with high availability on the road-map.  This pacemaker version was 
>> available through redhat for our environments
>>
>>   We are running into an issue where pacemaker causes a node to crash.  The 
>> last feature we integrated was SNMP notification.  While listing out the 
>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>> when the node crashed.  When we removed that feature, pacemaker was stable 
>> again.
>>
>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>> for list of known issues, we found this crm_mon memory leak issue.  Although 
>> not related, we think that there is some problem with the crm_mon process.
>>
>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>
>> Can you please let us know if there are issues with SNMP notification in 
>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>> workarounds for this issue if available, would be very helpful for us.  
>> Please help.
>>
>> Thanks,
>> Karthik.
> 
> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
> an external script that generates the SNMP trap?
> 
> If you're using the built-in capability, that has to be explicitly
> enabled when Pacemaker is compiled. Many distributions (including RHEL)
> do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
> have it enabled, otherwise not.
> 
> If you're using an external script to generate the SNMP trap, please
> post it (with any sensitive info taken out of course).
> 
> The ClusterMon resource will generate a crm_mon at regular intervals,
> but it should exit quickly. It sounds like it's not exiting at all,
> which is why you see this problem.
> 
> If you have a RHEL subscription, you can open a support ticket with Red
> Hat. Note that stonith must be enabled before Red Hat (and many other
> vendors) will support a cluster. Also, you should be able to "yum
> update" to a much newer version of Pacemaker to get bugfixes, if you're
> using RHEL 6 or 7.
> 
> FYI, the latest upstream Pacemaker has a new feature that will be in
> 1.1.14, allowing it to call an external notification script without
> needing a ClusterMon resource.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-11-02 Thread Ken Gaillot
On 11/01/2015 03:43 AM, Karthikeyan Ramasamy wrote:
> Thanks, Ken.
> 
> I understand about stonith.  We are introducing pacemaker for an existing 
> product not for a new product.  Currently, client-side is responsible for 
> load-balancing.  
> 
> High-availability for our product is the next step.  Now, we are introducing 
> it to manage the services and a single point of control for managing the 
> services.  Once the customers get used to this, we will introduce 
> high-availability.
> 
> About the logs, can you please let me know the symptoms that I need to look 
> for?

I'd look for anything "unusual", but that's hard to describe and nearly
impossible if you're not familiar with what's "usual". I'd look for
something repeating over and over in a short time (1 or 2 seconds).

Can you give a general idea of the cluster environment? How many
resources, what cluster options are set, whether configuration changes
are being made frequently, whether failures are common, whether the
network is reliable with low latency, etc.

You might try attaching to one of the busy processes with strace and see
if it's stuck in some sort of loop.

> Thanks,
> Karthik.
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 31 அக்டோபர் 2015 03:33
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker process 10-15% CPU
> 
> On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
>> Hello,
>>   We are using Pacemaker to manage the services that run on a node, as part 
>> of a service management framework, and manage the nodes running the services 
>> as a cluster.  One service will be running as 1+1 and other services with be 
>> N+1.
>>
>>   During our testing, we see that the pacemaker processes are taking about 
>> 10-15% of the CPU.  We would like to know if this is normal and could the 
>> CPU utilization be minimised.
> 
> It's definitely not normal to stay that high for very long. If you can attach 
> your configuration and a sample of your logs, we can look for anything that 
> stands out.
> 
>> Sample Output of most used CPU process in a Active Manager is
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
>> /usr/libexec/pacemaker/cib
>> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
>> /usr/libexec/pacemaker/stonithd
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
>> /usr/libexec/pacemaker/cib
>> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
>> /usr/libexec/pacemaker/stonithd
>>
>>
>> We also observed that the processes are not distributed equally to all the 
>> available cores and saw that Redhat acknowledging that rhel doesn't 
>> distribute to the available cores efficiently.  We are trying to use 
>> IRQbalance to spread the processes to the available cores equally.
> 
> Pacemaker is single-threaded, so each process runs on only one core.
> It's up to the OS to distribute them, and any modern Linux (including
> RHEL) will do a good job of that.
> 
> IRQBalance is useful for balancing IRQ requests across cores, but it doesn't 
> do anything about processes (and doesn't need to).
> 
>> Please let us know if there is any way we could minimise the CPU 
>> utilisation.  We dont require stonith feature, but there is no way stop that 
>> daemon from running to our knowledge.  If that is also possible, please let 
>> us know.
>>
>> Thanks,
>> Karthik.
> 
> The logs will help figure out what's going wrong.
> 
> A lot of people would disagree that you don't require stonith :) Stonith is 
> necessary to recover from many possible failure scenarios, and without it, 
> you may wind up with data corruption or other problems.
> 
> Setting stonith-enabled=false will keep pacemaker from using stonith, but 
> stonithd will still run. It shouldn't take up significant resources.
> The load you're seeing is an indication of a problem somewhere.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Multiple OpenSIPS services on one cluster

2015-11-02 Thread Nuno Pereira
Hi all.

 

We have one cluster that has 9 nodes and 20 resources.

 

Four of those hosts are PSIP-SRV01-active, PSIP-SRV01-passive,
PSIP-SRV02-active and PSIP-SRV02-active.

They should provide an lsb:opensips service, 2 by 2:

. The SRV01-opensips and SRV01-IP resources should be active on one of
PSIP-SRV01-active or PSIP-SRV01-passive;

. The SRV02-opensips and SRV02-IP resources should be active on one of
PSIP-SRV02-active or PSIP-SRV02-passive.

 

The relevant configuration is the following:

 

Resources:

Resource: SRV01-IP (class=ocf provider=heartbeat type=IPaddr2)

  Attributes: ip=10.0.0.1 cidr_netmask=27

  Meta Attrs: target-role=Started

  Operations: monitor interval=8s (SRV01-IP-monitor-8s)

Resource: SRV01-opensips (class=lsb type=opensips)

  Operations: monitor interval=8s (SRV01-opensips-monitor-8s)

Resource: SRV02-IP (class=ocf provider=heartbeat type=IPaddr2)

  Attributes: ip=10.0.0.2 cidr_netmask=27

  Operations: monitor interval=8s (SRV02-IP-monitor-8s)

Resource: SRV02-opensips (class=lsb type=opensips)

  Operations: monitor interval=30 (SRV02-opensips-monitor-30)

 

Location Constraints:

  Resource: SRV01-opensips

Enabled on: PSIP-SRV01-active (score:100) (id:prefer1-srv01-active)

Enabled on: PSIP-SRV01-passive (score:99) (id:prefer3-srv01-active)

  Resource: SRV01-IP

Enabled on: PSIP-SRV01-active (score:100) (id:prefer-SRV01-ACTIVE)

Enabled on: PSIP-SRV01-passive (score:99) (id:prefer-SRV01-PASSIVE)

  Resource: SRV02-IP

Enabled on: PSIP-SRV02-active (score:100) (id:prefer-SRV02-ACTIVE)

Enabled on: PSIP-SRV02-passive (score:99) (id:prefer-SRV02-PASSIVE)

  Resource: SRV02-opensips

Enabled on: PSIP-SRV02-active (score:100) (id:prefer-SRV02-ACTIVE)

Enabled on: PSIP-SRV02-passive (score:99) (id:prefer-SRV02-PASSIVE)

 

Ordering Constraints:

  SRV01-IP then SRV01-opensips (score:INFINITY) (id:SRV01-opensips-after-ip)

  SRV02-IP then SRV02-opensips (score:INFINITY) (id:SRV02-opensips-after-ip)

Colocation Constraints:

  SRV01-opensips with SRV01-IP (score:INFINITY) (id:SRV01-opensips-with-ip)

  SRV02-opensips with SRV02-IP (score:INFINITY) (id:SRV02-opensips-with-ip)

 

Cluster Properties:

cluster-infrastructure: cman

.

symmetric-cluster: false

 

 

Everything works fine, until the moment that one of those nodes is rebooted.
In the last case the problem occurred with a reboot of PSIP-SRV01-passive,
that wasn't providing the service at that moment.

 

To be noted that all opensips nodes had the opensips service to be started on
boot by initd, which was removed in the meanwhile.

The problem is that the service SRV01-opensips is detected to be started on
both PSIP-SRV01-active and PSIP-SRV01-passive, and the SRV02-opensips is
detected to be started on both PSIP-SRV01-active and PSIP-SRV02-active.

After that and several operations done by the cluster, which include actions
to stop both SRV01-opensips on both PSIP-SRV01-active and PSIP-SRV01-passive,
and to stop SRV02-opensips on PSIP-SRV01-active and PSIP-SRV02-active, which
fail on PSIP-SRV01-passive, the resource SRV01-opensips becomes unmanaged.

 

Any ideas on how to fix this?

 

 

Nuno Pereira

G9Telecom

 



smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org