date:20110330

[Pacemaker] [pacemaker] To start all the resources on one node when HA starts in 2 node configuration.

2011-03-30 Thread rakesh k

Hi All

Here is my description regarding this

While configuring HA i used this CLI command
*crm configure location HTTPD Httpd   rule id="HTTPD-rule" 100: \#uname
eq hatest1rule id="HTTPD-rule1" 200: \#uname eq hatest2*

where Httpd is resource and given score 100 for hatest1 and score 200 for
node -2 hatest2
similarly there are other three resources where i have given score 100 for
first node and score 200 for second node

when HA starts it checks for the scores and starts the processes on hatest2

Is there any other better way such that heartbeat/pacemaker checks the node
level configuration rather than HA checks resource location constraint .


Regards
Rakesh
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Clearing a resource which returned "not installed" from START

2011-03-30 Thread Bob Schatz

I am running Pacemaker 1.0.9 and Heartbeat 3.0.3.

I started a resource and the agent start method returned "OCF_ERR_INSTALLED".

I have fixed the problem and I would like to restart the resource and I cannot 
get it to restart.

Any ideas?


Thanks,

Bob


The failcounts are 0 as shown below and with the crm_resource command:

 # crm_mon -1 -f
 
 Last updated: Wed Mar 30 19:55:39 2011
 Stack: Heartbeat
 Current DC: mgraid-sd6661-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f) - 
partition with quorum
 Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
 2 Nodes configured, unknown expected votes
5 Resources configured.


 Online: [ mgraid-sd6661-1 mgraid-sd6661-0 ]

  Clone Set: Fencing
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Clone Set: cloneIcms
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Clone Set: cloneOmserver
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Master/Slave Set: ms-SSSD6661
  Masters: [ mgraid-sd6661-0 ]
  Slaves: [ mgraid-sd6661-1 ]
  Master/Slave Set: ms-SSJD6662
  Masters: [ mgraid-sd6661-0 ]
  Stopped: [ SSJD6662:0 ]

 Migration summary:
 * Node mgraid-sd6661-0:
 * Node mgraid-sd6661-1:

 Failed actions:
SSJD6662:0_start_0 (node=mgraid-sd6661-1, call=27, rc=5, 
status=complete): not installed

I have also tried to cleanup the resource with these commands:

  #  crm_resource --resource SSJD6662:0  --cleanup --node 
mgraid-sd6661-1
  #  crm_resource --resource SSJD6662:1  --cleanup --node 
mgraid-sd6661-1
  #  crm_resource --resource SSJD6662:0  --cleanup --node 
mgraid-sd6661-0
  #  crm_resource --resource SSJD6662:1 --cleanup --node mgraid-sd6661-0
  # crm_resource --resource ms-SSJD6662 --cleanup --node mgraid-sd6661-1

  # crm resource start SSJD6662:0

My configuration is:

node $id="856c1f72-7cd1-4906-8183-8be87eef96f2" mgraid-sd6661-1
node $id="f4e5e15c-d06b-4e37-89b9-4621af05128f" mgraid-sd6661-0
primitive SSJD6662 ocf:omneon:ss \
params ss_resource="SSJD6662" 
ssconf="/var/omneon/config/config.JD6662" \
op monitor interval="3s" role="Master" timeout="7s" \
op monitor interval="10s" role="Slave" timeout="7" \
op stop interval="0" timeout="20" \
op start interval="0" timeout="300"
primitive SSSD6661 ocf:omneon:ss \
params ss_resource="SSSD6661" 
ssconf="/var/omneon/config/config.SD6661" \
op monitor interval="3s" role="Master" timeout="7s" \
op monitor interval="10s" role="Slave" timeout="7" \
op stop interval="0" timeout="20" \
op start interval="0" timeout="300"
primitive icms lsb:S53icms \
op monitor interval="5s" timeout="7" \
op start interval="0" timeout="5"
primitive mgraid-stonith stonith:external/mgpstonith \
params hostlist="mgraid-canister" \
op monitor interval="0" timeout="20s"
primitive omserver lsb:S49omserver \
op monitor interval="5s" timeout="7" \
op start interval="0" timeout="5"
ms ms-SSJD6662 SSJD6662 \
meta clone-max="2" notify="true" globally-unique="false" 
target-role="Started"
ms ms-SSSD6661 SSSD6661 \
meta clone-max="2" notify="true" globally-unique="false" 
target-role="Started"
clone Fencing mgraid-stonith
clone cloneIcms icms
clone cloneOmserver omserver
location ms-SSJD6662-master-w1 ms-SSJD6662 \
rule $id="ms-SSJD6662-master-w1-rule" $role="master" 100: #uname eq 
mgraid-sd6661-1
location ms-SSSD6661-master-w1 ms-SSSD6661 \
rule $id="ms-SSSD6661-master-w1-rule" $role="master" 100: #uname eq 
mgraid-sd6661-0
order orderms-SSJD6662 0: cloneIcms ms-SSJD6662
order orderms-SSSD6661 0: cloneIcms ms-SSSD6661
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat" \
dc-deadtime="5s" \
stonith-enabled="true" \
last-lrm-refresh="1301536426"


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] lrmd: WARN: G_SIG_dispatch: Dispatch function for S 1000 ms (> 100 ms) before being called

2011-03-30 Thread Tim Serong

On 3/31/2011 at 06:05 AM, Jean-Francois Malouin
 wrote: 
> Hi, 
>  
> A little more than a month ago I posted on the subjet line warning and 
> was told that they were harmless unless very frequent. They are now 
> popping more than 10 times a day. 
> I was asked to create a bug report if I wanted more info. So now I 
> have an hb_report ready to go. Excuse the naive question, but where/how 
> do I submit it? 

http://developerbugs.linux-foundation.org/enter_bug.cgi

HTH,

Tim


-- 
Tim Serong 
Senior Clustering Engineer, OPS Engineering, Novell Inc.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

2011-03-30 Thread renayama19661014

Hi All,

We tested the trouble of the clone resource in the next procedure.


Step1) We start a cluster in three nodes.


Last updated: Thu Mar 31 10:01:47 2011
Stack: Heartbeat
Current DC: srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311) - partition with quorum
Version: 1.0.10-9342a4147fc69f2081f8563a34509da5be0a89d0
3 Nodes configured, unknown expected votes
4 Resources configured.


Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
main_rsc(ocf::pacemaker:Dummy) Started 
prmDummy1:0 (ocf::pacemaker:Dummy) Started 
prmPingd:0  (ocf::pacemaker:ping) Started 
Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
prmDummy1:1 (ocf::pacemaker:Dummy) Started 
main_rsc2   (ocf::pacemaker:Dummy) Started 
prmPingd:1  (ocf::pacemaker:ping) Started 
Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
prmDummy1:2 (ocf::pacemaker:Dummy) Started 
prmPingd:2  (ocf::pacemaker:ping) Started 

Inactive resources:


Migration summary:
* Node srv01:  pingd=1
* Node srv03:  pingd=1
* Node srv02:  pingd=1

Step2) In a srv01 node, We generate the trouble of the clone resource.

[root@srv01 ~]# rm -rf /var/run/Dummy-prmDummy1.state 

Step3) In a srv02 node, it takes the reboot of the pingd clone. Under influence 
of this, rebooting, main_rsc2 reboots.
 * The number of the clone becomes funny somehow or other, too.

[root@srv02 ~]# tail -f /var/log/ha-log | grep stop
Mar 31 10:02:22 srv02 crmd: [24471]: info: do_lrm_rsc_op: Performing 
key=29:4:0:6c32b0f8-d37a-4ebc-8365-30e2e02ba9d3 op=prmPingd:1_stop_0 )
Mar 31 10:02:25 srv02 lrmd: [24468]: info: rsc:prmPingd:1:12: stop
Mar 31 10:02:25 srv02 crmd: [24471]: info: process_lrm_event: LRM operation 
prmPingd:1_stop_0 (call=12, rc=0, cib-update=21, confirmed=true) ok
Mar 31 10:02:33 srv02 crmd: [24471]: info: do_lrm_rsc_op: Performing 
key=9:5:0:6c32b0f8-d37a-4ebc-8365-30e2e02ba9d3 op=main_rsc2_stop_0 )
Mar 31 10:02:33 srv02 lrmd: [24468]: info: rsc:main_rsc2:14: stop
Mar 31 10:02:33 srv02 crmd: [24471]: info: process_lrm_event: LRM operation 
main_rsc2_stop_0 (call=14, rc=0, cib-update=23, confirmed=true) ok


Last updated: Thu Mar 31 10:02:40 2011
Stack: Heartbeat
Current DC: srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311) - partition with quorum
Version: 1.0.10-9342a4147fc69f2081f8563a34509da5be0a89d0
3 Nodes configured, unknown expected votes
4 Resources configured.


Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
prmDummy1:1 (ocf::pacemaker:Dummy) Started -> :1(funny)
prmPingd:0  (ocf::pacemaker:ping) Started  -> :0(funny)
Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
main_rsc(ocf::pacemaker:Dummy) Started 
prmDummy1:2 (ocf::pacemaker:Dummy) Started -> :2(funny)
prmPingd:1  (ocf::pacemaker:ping) Started  -> :1(funny)

Inactive resources:

 main_rsc2  (ocf::pacemaker:Dummy): Stopped 
 Clone Set: clnDummy1
 Started: [ srv02 srv03 ]
 Stopped: [ prmDummy1:0 ]
 Clone Set: clnPingd
 Started: [ srv02 srv03 ]
 Stopped: [ prmPingd:2 ]

Migration summary:
* Node srv01: 
   prmDummy1:0: migration-threshold=1 fail-count=1
* Node srv03:  pingd=1
* Node srv02:  pingd=1

Failed actions:
prmDummy1:0_monitor_1 (node=srv01, call=8, rc=7, status=complete): not 
running


We think the reboot of pingd to be unnecessary in a srv02 node. 
Is there the method how this problem is settled?

Possibly the next bug may be related.
 * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2508


I registered the log with Bugzilla.(attached hb_report)
 * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2574 


Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] updating resource attributes

2011-03-30 Thread Alan Jones

What I'm looking for is a way to pass parameters to my resource stop operation.
My first attempt has been to set the paramter with crm_resource and
then stop the resource.
1) crm_resource --resource myres --set-parameter myparam
--parameter-value myvalue
2) crm_resource --resource myres --set-parameter target-role --meta
--parameter-value Stopped

Unfortunately, step 1 results in the resource being restarted in order
to update the agent.
As this resource takes time to stop and start, it is not a good design for me.
A friend suggested defining another resource with null start and stops
and put the params in there, however, I have two objections:
1. the params would no longer be instance specific
2. it is more difficult to access the values, i.e. instance params
come in the environment

My first choice would be to disable this restart on param change
behavior of Pacemaker.
Does anyone have suggestions?
Alan

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] lrmd: WARN: G_SIG_dispatch: Dispatch function for S 1000 ms (> 100 ms) before being called

2011-03-30 Thread Jean-Francois Malouin

Hi,

A little more than a month ago I posted on the subjet line warning and
was told that they were harmless unless very frequent. They are now
popping more than 10 times a day.
I was asked to create a bug report if I wanted more info. So now I
have an hb_report ready to go. Excuse the naive question, but where/how
do I submit it?

thanks
jf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

2011-03-30 Thread Dejan Muhamedagic

Hi,

On Wed, Mar 30, 2011 at 09:26:49AM +0100, darren.mans...@opengi.co.uk wrote:
> From: Pavel Levshin [mailto:pa...@levshin.spb.ru] 
> Sent: 25 March 2011 19:50
> To: pacemaker@oss.clusterlabs.org
> Subject: Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue
> 
>  
> 
> 25.03.2011 18:47, darren.mans...@opengi.co.uk: 
> 
> 
> 
> 
> 
> We configure a virtual IP on the non-arping lo interface of both servers
> and then configure the IPaddr2 resource with lvs_support=true. This RA
> will remove the duplicate IP from the lo interface when it becomes
> active. Grouping the VIP with ldirectord/LVS we can have the
> load-balancer and VIP on one node, balancing traffic to the other node
> with failover where both resources failover together.
> 
>  
> 
> To do this we need to configure the VIP on lo as a 32 bit netmask but
> the VIP on the eth0 interface needs to have a 24 bit netmask. This has
> worked fine up until now and we base all of our clusters on this method.
> Now what happens is that the find_interface() routine in IPaddr2 doesn't
> remove the IP from lo when starting the VIP resource as it can't find it
> due to the netmask not matching.

Can you please open a bugzilla and attach hb_report.

Thanks,

Dejan

> Do you really need the address to be deleted from lo? Having two
> identical addresses on the Linux machine should not harm, if routing was
> not affected. In your case, with /32 netmask on lo, I do not foresee any
> problems.
> 
> We use it in this way, i.e. with the address set on lo permanently.
> 
> 
> --
> Pavel Levshin
> 
>  
> 
>  
> 
> Thanks Pavel,
> 
>  
> 
> However, this means I would have to disable LVS support for the
> resource. Which means that to make it work with LVS I have to set
> lvs_support to false.
> 
>  
> 
> Of course, I'll do whatever it takes on my set up to make it work, but
> it's not intuitive for other users.
> 
>  
> 
> Regards
> 
> Darren Mansell
> 

> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Pacemaker warm-up latency

2011-03-30 Thread cfk

Dear all,

I have a similar question to that on
http://oss.clusterlabs.org/pipermail/pacemaker/2011-March/009750.html.

At the moment I start corosync and pacemaker, the monitoring status by
executing crm_mon is:
*

Last updated: Wed Mar 30 21:51:49 2011
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
4 Resources configured.

Node alpha1: OFFLINE
Node alpha2: OFFLINE
*

After around 1 min. the monitor status becomes:
*

Last updated: Wed Mar 30 21:52:54 2011
Stack: openais
Current DC: alpha1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
4 Resources configured.

Node alpha1: online
Disk:0 (ocf::linbit:drbd) Master
ClusterIP (ocf::heartbeat:IPaddr2) Started
FS (ocf::heartbeat:Filesystem) Started
WebSite (ocf::heartbeat:apache) Started
Node alpha2: online
Disk:1 (ocf::linbit:drbd) Slave
*

My questions are:
1. Why does the pacemaker warm-up behavior spend some time? Is it
controlled by a configuration value?
2. At the warm-up time, how can I detect that the node status becomes
online? Using a shell script to parse crm_mon result periodically until the
“online” sub-string?

Thanks for your help in advance,.

Chia-Feng Kang

本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。
This email may contain confidential information. Please do not use or disclose
it in any way and delete it if you are not the intended recipient.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Software-only STONITH device

2011-03-30 Thread cfk

Hi Dejan,

Based on the information you provided, I also study the document of IBM RSA 
(http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-4ZVQKY ) 
and HP iLO 
(http://h2.www2.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf
 ). 
It seems that the minimum number of nodes required in a STONITH-enabled are 3. 

Thanks you for your help.

Chia-Feng Kang

-Original Message-
From: c...@itri.org.tw [mailto:c...@itri.org.tw] 
Sent: Wednesday, March 30, 2011 10:11 AM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Software-only STONITH device

Hello,

I learned that IBMRSA architecture from http://www.opengear.com/SP-IBM.html.

Moreover, I also plan to study external/ipmi and ibmhmc at the same time. 

Assume I have a two-node cluster, and each node is equipped with more than one 
Ethernet interface.

Can I use external/ipmi and ibmhmc to set up a STONITH-enabled cluster? 

Are these connection internal independent (Ethernet interfaces of one node 
can't communicated with each other.) for out-of-band management 
(http://en.wikipedia.org/wiki/Out-of-band)?  

Thanks for your help again.

Chia-Feng Kang

Ps: In IBMHMC redbook on 
http://publib-b.boulder.ibm.com/redbooks.nsf/RedbookAbstracts/SG247038.html, it 
seems that serial connection is still required if Figure 1-11 is referenced.

-Original Message-
From: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
Sent: Tuesday, March 29, 2011 9:04 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Software-only STONITH device

Hi,

On Tue, Mar 29, 2011 at 08:45:13PM +0800, c...@itri.org.tw wrote:
> Dear all,
> 
> As a beginner to Fencing/STONITH implementation in Linux HA, I try to set up 
> a two-node STONITH-enabled cluster environment.
> 
> The STONITH devices supported in my environment are listed below:
> 
> apcmaster
> apcmastersnmp
> apcsmart
> baytech
> bladehpi
> cyclades
> external/drac5
> external/dracmc-telnet
> external/hmchttp
> external/ibmrsa
> external/ibmrsa-telnet
> external/ipmi
> external/ippower9258
> external/kdumpcheck
> external/rackpdu
> external/riloe
> external/sbd
> external/vmware
> external/xen0
> external/xen0-ha
> ibmhmc
> ipmilan
> meatware
> nw_rpc100s
> rcd_serial
> rps10
> suicide
> wti_mpc
> wti_nps
> 
> Is there any software-only, implementation without additional Quorum node( 
> please let me know the appropriate term because I think the noun is not 
> clear), STONITH device among them available?

No. The closest is rcd_serial for which you need to build a
special serial cable. Otherwise, I strongly advise to obtain
either computers with some lights-out device (iLO or IBM RSA or
similar) or a PDU/UPS. Take a look here
http://www.clusterlabs.org/doc/crm_fencing.html for more
details.

Thanks,

Dejan

> Thanks in advance.
> 
> Chia-Feng Kang
> 
> 
> 
> 
> 
> 
> 
> 本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。 
> This email may contain confidential information. Please do not use or 
> disclose it in any way and delete it if you are not the intended recipient.
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose 
it in any way and delete it if you are not the intended recipient.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose 
it in any way and delete it if you are not the intended recipient.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch

Re: [Pacemaker] How to send email-notification on failure of resource in cluster frame work

2011-03-30 Thread Vadym Chepkov


On Mar 29, 2011, at 11:34 PM, Michael Schwartzkopff wrote:

>> On Mar 29, 2011 6:12 AM, "Michael Schwartzkopff" 
>> 
>> wrote:
 On Tue, Mar 29, 2011 at 3:29 AM, Vadym Chepkov 
>> 
>> wrote:
> On Mar 24, 2011, at 12:46 AM, Rakesh K wrote:
>> Hi ALL
>> Is there any way to send Email notifications when a resource is
>> 
>> failure
>> 
>> in the cluster frame work.
>> 
>> while i was going through the Pacemaker-explained document provided
>> 
>> in
>> 
>> the website www.clusterlabs.org
>> 
>> There was no content in the chapter 7 --> which is sending email
>> notification events.
>> 
>> can anybody help me regarding this.
>> 
>> for know i am approaching the crm_mon --daemonize --as-html > fil> to maintain the status of HA in html file.
>> 
>> Is there any other approach for sending email notification.
> 
> Last time I checked, crm_mon is not well suited for this purpose.
> 
> crm_mon has the following option
> 
>  -T, --mail-to=value
> 
> Send  Mail  alerts  to  this  user.See   also
> 
> --mail-from, --mail-host, --mail-prefix
> 
> But you will end-up with obscene amount of e-mails, I was blocked
> from gmail when I tried to use it once :) For one resource failure
> you will get 4 e-mails: monitor,stop,start,monitor. Now imagine if
> it was a
>> 
>> most
>> 
> significant member of a group or worse, node failure...
> 
> nagios would be better suited for this purpose, but, unfortunately,
> crm_mon is broken
> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2344) for
> quite awhile.
 
 The fix is going to have to come from the community, I don't have any
 knowledge of nagios
 
> I am yet to find a good monitoring solution for pacemaker, hopefully
> somebody had more success and will share.
>>> 
>>> Use SNMP. It is the standard protocol for monitoring. Add a "extend" line
>> 
>> to
>> 
>>> your snmpd.conf to call a script that returns the number of failcounts.
>> 
>> You
>> 
>>> can easily monitoring this with every NMS. For nagios use check_snmp.
>> 
>> I afraid it won't be able to tell more then "stuff happened" :(
>> Would it?
> 
> Yes. Like a good NMS always does. To analyse the error you still have to read 
> the logs yourself.
> 


What I meant was, I can't see how one "extend" line will be able to supply 
specifics about what exactly resource has failed.
Would you kindly share en example? 

I was trying to integrate crm_mon with SNMP Trap Translator (snmptt), but 
haven't had luck with it either. 
I posted details in another thread.

Lack of "out-of-the-box" monitoring solution for pacemaker is a major 
deficiency in my daily use, I am sure I am not alone.
Maybe it's out there, but Chapter 7 of "Pacemaker Explained" is yet to be 
written.

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

2011-03-30 Thread Darren.Mansell

From: Pavel Levshin [mailto:pa...@levshin.spb.ru] 
Sent: 25 March 2011 19:50
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

25.03.2011 18:47, darren.mans...@opengi.co.uk: 

We configure a virtual IP on the non-arping lo interface of both servers
and then configure the IPaddr2 resource with lvs_support=true. This RA
will remove the duplicate IP from the lo interface when it becomes
active. Grouping the VIP with ldirectord/LVS we can have the
load-balancer and VIP on one node, balancing traffic to the other node
with failover where both resources failover together.

To do this we need to configure the VIP on lo as a 32 bit netmask but
the VIP on the eth0 interface needs to have a 24 bit netmask. This has
worked fine up until now and we base all of our clusters on this method.
Now what happens is that the find_interface() routine in IPaddr2 doesn't
remove the IP from lo when starting the VIP resource as it can't find it
due to the netmask not matching.

Do you really need the address to be deleted from lo? Having two
identical addresses on the Linux machine should not harm, if routing was
not affected. In your case, with /32 netmask on lo, I do not foresee any
problems.

We use it in this way, i.e. with the address set on lo permanently.

--
Pavel Levshin

Thanks Pavel,

However, this means I would have to disable LVS support for the
resource. Which means that to make it work with LVS I have to set
lvs_support to false.

Of course, I'll do whatever it takes on my set up to make it work, but
it's not intuitive for other users.

Regards

Darren Mansell

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] [pacemaker] To start all the resources on one node when HA starts in 2 node configuration.

[Pacemaker] Clearing a resource which returned "not installed" from START

Re: [Pacemaker] lrmd: WARN: G_SIG_dispatch: Dispatch function for S 1000 ms (> 100 ms) before being called

[Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

[Pacemaker] updating resource attributes

[Pacemaker] lrmd: WARN: G_SIG_dispatch: Dispatch function for S 1000 ms (> 100 ms) before being called

Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

[Pacemaker] Pacemaker warm-up latency

Re: [Pacemaker] Software-only STONITH device

Re: [Pacemaker] How to send email-notification on failure of resource in cluster frame work

Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

11 matches

Site Navigation

Mail list logo

Footer information