Re: [Linux-HA] Machine readable cluster status

2009-08-07 Thread Michael Schwartzkopff
Am Freitag, 7. August 2009 15:30:31 schrieb Denis Chapligin:
> Hi!
>
> Is there any tool, that can be used to retrieve machine readable
> cluster status? crm_mon -s -1 doesn't show resource state. I've also
> tried parsing 'cibadmin --query' output, but it only gives me
> information about node states, while i'm intersted in resource states
> too.

Yes. It is called the SNMP subagent for Linux-HA. Every management system can 
talk to it.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Machine readable cluster status

2009-08-07 Thread Denis Chapligin
Hi!

Is there any tool, that can be used to retrieve machine readable
cluster status? crm_mon -s -1 doesn't show resource state. I've also
tried parsing 'cibadmin --query' output, but it only gives me
information about node states, while i'm intersted in resource states
too.

-- 
Denis Chapligin
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Resolved Re: problem starting apache with heartbeat v2 CentOS 5.3

2009-08-07 Thread Testuser SST
Hi,

looks like the apache on my systems does not like the command:
sh -c wget -O- -q -L --bind-address=127.0.0.1 http://*:80/server-status | tr 
'\012' ' ' | grep -Ei"[[:space:]]*" >/dev/null

especialy the "http://*:80/server-status"; part won´t create any request-entries 
in the access-log of the httpd. But when I use the command with a 
http://127.0.0.1:80/server-status it works.

So i use a statusurl-entry in my apache-resource.

http://127.0.0.1:80/server-status"/>

Kind Regards

SST

 Original-Nachricht 
> Datum: Thu, 06 Aug 2009 18:23:03 +0200
> Von: "Testuser  SST" 
> An: linux-ha@lists.linux-ha.org
> Betreff: [Linux-HA] problem starting apache with heartbeat v2 CentOS 5.3

> Hi,
> 
> Im trying to start a apache on top of a drbd. It seems to me, as if the
> monitoring-function at startup fails. The apache-serviceis running, and when
> the apache is started without the heartbeat, I can access the
> /server-status page at ease with lynx con commandline and wget is 
> installed.Maybe its a
> timeout-problem ? here are some debug-files:
> 
> lrmd[5697]: 2009/08/06_15:57:57 info: rsc:service_apache: start
> crmd[5700]: 2009/08/06_15:57:57 debug: do_lrm_rsc_op: Recording pending
> op: 20 - service_apache_start_0 service_apache:20
> mgmtd[5708]: 2009/08/06_15:57:57 debug: update cib finished
> crmd[5700]: 2009/08/06_15:57:57 debug: cib_rsc_callback: Resource update
> 32 complete: rc=0
> apache[6673]:   2009/08/06_15:57:57 INFO: apache not running
> apache[6673]:   2009/08/06_15:57:57 INFO: waiting for apache
> /etc/httpd/conf/httpd.conf to come up
> apache[6673]:   2009/08/06_15:57:58 ERROR: command failed: sh -c wget -O-
> -q -L --bind-address=127.0.0.1 http://*:80/server-status | tr '\012' ' ' |
> grep -Ei
>  "[[:space:]]*" >/dev/null
> lrmd[5697]: 2009/08/06_15:57:58 WARN: Managed service_apache:start process
> 6673 exited with return code 1.
> crmd[5700]: 2009/08/06_15:57:58 ERROR: process_lrm_event: LRM operation
> service_apache_start_0 (call=20, rc=1) Error unknown error
> crmd[5700]: 2009/08/06_15:57:58 debug: build_operation_update: Calculated
> digest 78bd7553bdf10a58c69f3a3b70d66ba0 for service_apache_start_0
> (4:1;43:3:ef8ed907-59b7-42b0-9c6f-dff1f8a6d25e)
> crmd[5700]: 2009/08/06_15:57:58 debug: log_data_element:
> build_operation_update: digest:source  configfile="/etc/httpd/conf/httpd.conf"
> httpd="/usr/sbin/httpd"/>
> crmd[5700]: 2009/08/06_15:57:58 debug: get_rsc_metadata: Retreiving
> metadata for apache::ocf:heartbeat
> crmd[5700]: 2009/08/06_15:57:58 debug: append_restart_list: Resource
> service_apache does not support reloads
> crmd[5700]: 2009/08/06_15:57:58 debug: do_update_resource: Sent resource
> state update message: 33
> crmd[5700]: 2009/08/06_15:57:58 debug: process_lrm_event: Op
> service_apache_start_0 (call=20): Confirmed
> mgmtd[5708]: 2009/08/06_15:57:59 debug: update cib finished
> crmd[5700]: 2009/08/06_15:57:59 debug: cib_rsc_callback: Resource update
> 33 complete: rc=0
> mgmtd[5708]: 2009/08/06_15:58:00 debug: update cib finished
> crmd[5700]: 2009/08/06_15:58:01 info: do_lrm_rsc_op: Performing
> op=service_apache_stop_0 key=2:4:ef8ed907-59b7-42b0-9c6f-dff1f8a6d25e)
> lrmd[5697]: 2009/08/06_15:58:01 debug: on_msg_perform_op: add an operation
> operation stop[21] on ocf::apache::service_apache for client 5700, its
> parameters:
>  CRM_meta_timeout=[2] crm_feature_set=[2.0]  to the operation list.
> lrmd[5697]: 2009/08/06_15:58:01 info: rsc:service_apache: stop
> crmd[5700]: 2009/08/06_15:58:01 debug: do_lrm_rsc_op: Recording pending
> op: 21 - service_apache_stop_0 service_apache:21
> apache[6782]:   2009/08/06_15:58:02 INFO: Killing apache PID 6712
> apache[6782]:   2009/08/06_15:58:02 INFO: apache stopped.
> lrmd[5697]: 2009/08/06_15:58:02 info: Managed service_apache:stop process
> 6782 exited with return code 0.
> crmd[5700]: 2009/08/06_15:58:02 info: process_lrm_event: LRM operation
> service_apache_stop_0 (call=21, rc=0) complete
> crmd[5700]: 2009/08/06_15:58:02 debug: do_update_resource: Sent resource
> state update message: 34
> crmd[5700]: 2009/08/06_15:58:02 debug: process_lrm_event: Op
> service_apache_stop_0 (call=21): Confirmed
> 
> 
> 
> 
> any suggestions are welcome
> 
> Kind Regards
> 
> SST
> 
> 
> -- 
> Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
> für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Pacemaker 1.4 & HBv2 1.99 // About quorum choice

2009-08-07 Thread Alain.Moulle
Hi,

ok but do you agree that in case of heartbeat network problem, there will
be a "race to stonith" from all nodes in the cluster and so the risk that
both nodes will be killed is not zero ?
That's why I thought that a ping towards an equipment out of the cluster
should reduce the risk of split brain : suppose that each node pings its 
Eth switch
(each node connected to a different Eth switch) , and suppose that there is
a network problem on one side only, the node which has problem will
not ping and will suicide itself, whereas the node which will ping the
Eth switch will not suicide and will stonith the other one.
Do you agree with this "theory" ?
Thanks
Alain
> And how should we proceed to avoid split-brain cases in a two-nodes 
> > cluster  in case
> > of problems on heartbeat network ?
>   
>
> make "network" "networks" (plural) to reduce the chance of getting into
> a split-brain sitatuation and get and configure stonith devices to
> protect your data in case it happens anyways.
>
> Regards
> Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] nodes from 1 cluster appearing in another cl uster

2009-08-07 Thread Michael Schwartzkopff
Am Freitag, 7. August 2009 10:28:52 schrieb Yan Gao:
>  >>>On 8/6/2009 at  8:46 PM, Bernie Wu  wrote:
> >
> > Hi Listers,
> > We are running heartbeat 2.1.3-0.9 under zVM 5.4 / SLES10-SP2.
> > We have 2 test clusters, one with 3 nodes and the other with 2 nodes.
> >
> > How do I prevent the nodes from one cluster showing up in the other
>
> cluster
>
> > and vice versa ?
> > Here is my ha.cf for both clusters:
> > Cluster 1:
> > use_logd on
> > autojoin other
> > node lnxhat1 lnxhat2
> > bcast hsi0
> > crm on
> > ping 172.22.4.1  172.31.100.31  172.31.100.32
> > respawn root /usr/lib64/heartbeat/pingd -m 2000 -d 5s -a my_ping_set
> >
> > Cluster2:
> > use_logd on
> > autojoin other
> > node lnodbbt  lnodbct
> > bcast hsi0
> > crm on
>
> Different udports, different authkeys or not autojoin.

different portds are OK.

If you just use different keys you will get a LOT of "auth failed" entries in 
the log file.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] nodes from 1 cluster appearing in another cluster

2009-08-07 Thread Yan Gao


 >>>On 8/6/2009 at  8:46 PM, Bernie Wu  wrote: 
> Hi Listers, 
> We are running heartbeat 2.1.3-0.9 under zVM 5.4 / SLES10-SP2. 
> We have 2 test clusters, one with 3 nodes and the other with 2 nodes.

> How do I prevent the nodes from one cluster showing up in the other
cluster  
> and vice versa ? 
> Here is my ha.cf for both clusters: 
> Cluster 1: 
> use_logd on 
> autojoin other 
> node lnxhat1 lnxhat2 
> bcast hsi0 
> crm on 
> ping 172.22.4.1  172.31.100.31  172.31.100.32 
> respawn root /usr/lib64/heartbeat/pingd -m 2000 -d 5s -a my_ping_set

> Cluster2: 
> use_logd on 
> autojoin other 
> node lnodbbt  lnodbct 
> bcast hsi0 
> crm on 

Different udports, different authkeys or not autojoin.



Regards,
Yan Gao
China R&D Software Engineer
y...@novell.com

Novell, Inc.
Making IT Work As One™
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Pacemaker 1.4 & HBv2 1.99 // About quorum choice (contd.)

2009-08-07 Thread Dominik Klein
Alain.Moulle wrote:
> Hello Andrew,
> Could you explain why this functionnality is no more available 
> (configuration
> lines remain in ha.cf) ?

ipfail was replaced by pingd in v2. That was in the very first version
of v2 afaik.

> And how should we proceed to avoid split-brain cases in a two-nodes 
> cluster  in case
> of problems on heartbeat network ?

make "network" "networks" (plural) to reduce the chance of getting into
a split-brain sitatuation and get and configure stonith devices to
protect your data in case it happens anyways.

Regards
Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Flooded Heartbeat log (logfile)

2009-08-07 Thread Ehlers, Kolja
Hi,

my Cluster is running fine but my logfile is being flooded by these messages:

lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: 
tomcat38-www2test_monitor_0 on www2test returned 0 (ok) instead of the
expected value: 7 (not running)
lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation 
tomcat38-www2test_monitor_0 found resource
tomcat38-www2test active on www2test
lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: 
tomcat38j-www2test_monitor_0 on www2test returned 0 (ok) instead of
the expected value: 7 (not running)
lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation 
tomcat38j-www2test_monitor_0 found resource
tomcat38j-www2test active on www2test
lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: 
tomcat37-www2test_monitor_0 on www2test returned 0 (ok) instead of the
expected value: 7 (not running)
lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation 
tomcat37-www2test_monitor_0 found resource
tomcat37-www2test active on www2test
lha-snmpagent[5252]: 2009/08/07_09:38:37 ERROR: crm_int_helper: Characters left 
over after parsing 'INFINITY': 'INFINITY'

Why does the cluster expect all resources to not be running? I have configured 
an asymmetrical Opt-In Clusters and this is what my
setup looks like:

Resource Group: IP_and_Apache
IPaddr  (ocf::heartbeat:IPaddr):Started www2test
apache2 (ocf::cr:apache):   Started www2test
tomcat21-www1test   (ocf::cr:tomcat):   Started www1test
tomcat22-www1test   (ocf::cr:tomcat):   Started www1test
tomcat22sdb-www1test(ocf::cr:tomcat):   Started www1test
tomcat30-www1test   (ocf::cr:tomcat):   Started www1test
tomcat34-www1test   (ocf::cr:tomcat):   Started www1test
tomcat35-www1test   (ocf::cr:tomcat):   Started www1test
tomcat36-www1test   (ocf::cr:tomcat):   Started www1test
tomcat37-www1test   (ocf::cr:tomcat):   Started www1test
tomcat38-www1test   (ocf::cr:tomcat):   Started www1test
tomcat38j-www1test  (ocf::cr:tomcat):   Started www1test
tomcat21-www2test   (ocf::cr:tomcat):   Started www2test
tomcat22-www2test   (ocf::cr:tomcat):   Started www2test
tomcat22sdb-www2test(ocf::cr:tomcat):   Started www2test
tomcat30-www2test   (ocf::cr:tomcat):   Started www2test
tomcat34-www2test   (ocf::cr:tomcat):   Started www2test
tomcat35-www2test   (ocf::cr:tomcat):   Started www2test
tomcat36-www2test   (ocf::cr:tomcat):   Started www2test
tomcat37-www2test   (ocf::cr:tomcat):   Started www2test
tomcat38-www2test   (ocf::cr:tomcat):   Started www2test
tomcat38j-www2test  (ocf::cr:tomcat):   Started www2test

I have these contraints for each resource:

  
  

and

  
  

Thanks 

Kolja

Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to remove nodes with hb_gui

2009-08-07 Thread Yan Gao


 >>>On 8/6/2009 at  8:21 PM, Bernie Wu  wrote: 
> Thanks Yan Gao for the reply.  We're using heartbeat 2.1.3-0.9
running under  
> zVM 5.4 / SLES10-SP2.  So I guess I have to use cibadmin.  So here
goes: 
>  
> 1.  # cibadmin -Q | grep node 
>  value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> 
>   
> type="normal"/> 
> type="normal"/> 
> type="normal"/> 
> type="normal"/> 
> type="normal"/> 
>   
>   crmd="online" crm-debug-origin="do_lrm_query" shutdown="0"
in_ccm="true"  
> ha="active" join="member" expected="member"> 
>   
>   ha="active" crm-debug-origin="do_update_resource" crmd="online"
shutdown="0"  
> in_ccm="true" join="member" expected="member"> 
>   
>  
> 2.  I then run  : 
> cibadmin -D -o nodes -X ' uname="lnodbbt" type="normal"/>' 
> Call cib_delete failed (-42): Write requires quorum 
>  
Hmm, that modification does need quorum for heartbeat-2.1.  Please try
"-f" option.

>  
> 3.  What now ?  The node that I am trying to delete belongs to
another  
> cluster. 
>  
> TIA 
>  
> On Wed, 2009-08-05 at 20:42 -0400, Bernie Wu wrote: 
> > Hi Listers, 
> > How can I remove nodes that currently appear in my Linux HA
Management  
> Client ? 
> If it's heartbeat based cluster, first you should run hb_delnode to 
> delete the nodes. 
>  
> And then delete them from cib: 
> If you are using the latest cluster stack, you could either delete
them 
> via the GUI if you have pacemaker-mgmt installed, Or run "crm node 
> delete ...". 
> If you are still using heartbeat-2.1, you have to run cibadmin to
delete 
> them. 
>  
> >   These nodes belong to another cluster and they appear as stopped.

> > 
> > TIA 
> > Bernie 
> > 
> >  
> > The information contained in this e-mail message is intended only
for the  
> personal and confidential use of the recipient(s) named above. This
message  
> may be an attorney-client communication and/or work product and as
such is  
> privileged and confidential. If the reader of this message is not the
 
> intended recipient or an agent responsible for delivering it to the
intended  
> recipient, you are hereby notified that you have received this
document in  
> error and that any review, dissemination, distribution, or copying of
this  
> message is strictly prohibited. If you have received this
communication in  
> error, please notify us immediately by e-mail, and delete the
original  
> message. 
> > ___ 
> > Linux-HA mailing list 
> > Linux-HA@lists.linux-ha.org 
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > See also: http://linux-ha.org/ReportingProblems 
> -- 
> Regards, 
> Yan Gao 
> China R&D Software Engineer 
> y...@novell.com 
>  
> Novell, Inc. 
> Making IT Work As One(tm) 
>  
> ___ 
> Linux-HA mailing list 
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>  
> The information contained in this e-mail message is intended only for
the  
> personal and confidential use of the recipient(s) named above. This
message  
> may be an attorney-client communication and/or work product and as
such is  
> privileged and confidential. If the reader of this message is not the
 
> intended recipient or an agent responsible for delivering it to the
intended  
> recipient, you are hereby notified that you have received this
document in  
> error and that any review, dissemination, distribution, or copying of
this  
> message is strictly prohibited. If you have received this
communication in  
> error, please notify us immediately by e-mail, and delete the
original  
> message. 
> ___ 
> Linux-HA mailing list 
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
>  



Regards,
Yan Gao
China R&D Software Engineer
y...@novell.com

Novell, Inc.
Making IT Work As One™
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems