Re: [Linux-HA] Machine readable cluster status
Am Freitag, 7. August 2009 15:30:31 schrieb Denis Chapligin: > Hi! > > Is there any tool, that can be used to retrieve machine readable > cluster status? crm_mon -s -1 doesn't show resource state. I've also > tried parsing 'cibadmin --query' output, but it only gives me > information about node states, while i'm intersted in resource states > too. Yes. It is called the SNMP subagent for Linux-HA. Every management system can talk to it. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Machine readable cluster status
Hi! Is there any tool, that can be used to retrieve machine readable cluster status? crm_mon -s -1 doesn't show resource state. I've also tried parsing 'cibadmin --query' output, but it only gives me information about node states, while i'm intersted in resource states too. -- Denis Chapligin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Resolved Re: problem starting apache with heartbeat v2 CentOS 5.3
Hi, looks like the apache on my systems does not like the command: sh -c wget -O- -q -L --bind-address=127.0.0.1 http://*:80/server-status | tr '\012' ' ' | grep -Ei"[[:space:]]*" >/dev/null especialy the "http://*:80/server-status"; part won´t create any request-entries in the access-log of the httpd. But when I use the command with a http://127.0.0.1:80/server-status it works. So i use a statusurl-entry in my apache-resource. http://127.0.0.1:80/server-status"/> Kind Regards SST Original-Nachricht > Datum: Thu, 06 Aug 2009 18:23:03 +0200 > Von: "Testuser SST" > An: linux-ha@lists.linux-ha.org > Betreff: [Linux-HA] problem starting apache with heartbeat v2 CentOS 5.3 > Hi, > > Im trying to start a apache on top of a drbd. It seems to me, as if the > monitoring-function at startup fails. The apache-serviceis running, and when > the apache is started without the heartbeat, I can access the > /server-status page at ease with lynx con commandline and wget is > installed.Maybe its a > timeout-problem ? here are some debug-files: > > lrmd[5697]: 2009/08/06_15:57:57 info: rsc:service_apache: start > crmd[5700]: 2009/08/06_15:57:57 debug: do_lrm_rsc_op: Recording pending > op: 20 - service_apache_start_0 service_apache:20 > mgmtd[5708]: 2009/08/06_15:57:57 debug: update cib finished > crmd[5700]: 2009/08/06_15:57:57 debug: cib_rsc_callback: Resource update > 32 complete: rc=0 > apache[6673]: 2009/08/06_15:57:57 INFO: apache not running > apache[6673]: 2009/08/06_15:57:57 INFO: waiting for apache > /etc/httpd/conf/httpd.conf to come up > apache[6673]: 2009/08/06_15:57:58 ERROR: command failed: sh -c wget -O- > -q -L --bind-address=127.0.0.1 http://*:80/server-status | tr '\012' ' ' | > grep -Ei > "[[:space:]]*" >/dev/null > lrmd[5697]: 2009/08/06_15:57:58 WARN: Managed service_apache:start process > 6673 exited with return code 1. > crmd[5700]: 2009/08/06_15:57:58 ERROR: process_lrm_event: LRM operation > service_apache_start_0 (call=20, rc=1) Error unknown error > crmd[5700]: 2009/08/06_15:57:58 debug: build_operation_update: Calculated > digest 78bd7553bdf10a58c69f3a3b70d66ba0 for service_apache_start_0 > (4:1;43:3:ef8ed907-59b7-42b0-9c6f-dff1f8a6d25e) > crmd[5700]: 2009/08/06_15:57:58 debug: log_data_element: > build_operation_update: digest:source configfile="/etc/httpd/conf/httpd.conf" > httpd="/usr/sbin/httpd"/> > crmd[5700]: 2009/08/06_15:57:58 debug: get_rsc_metadata: Retreiving > metadata for apache::ocf:heartbeat > crmd[5700]: 2009/08/06_15:57:58 debug: append_restart_list: Resource > service_apache does not support reloads > crmd[5700]: 2009/08/06_15:57:58 debug: do_update_resource: Sent resource > state update message: 33 > crmd[5700]: 2009/08/06_15:57:58 debug: process_lrm_event: Op > service_apache_start_0 (call=20): Confirmed > mgmtd[5708]: 2009/08/06_15:57:59 debug: update cib finished > crmd[5700]: 2009/08/06_15:57:59 debug: cib_rsc_callback: Resource update > 33 complete: rc=0 > mgmtd[5708]: 2009/08/06_15:58:00 debug: update cib finished > crmd[5700]: 2009/08/06_15:58:01 info: do_lrm_rsc_op: Performing > op=service_apache_stop_0 key=2:4:ef8ed907-59b7-42b0-9c6f-dff1f8a6d25e) > lrmd[5697]: 2009/08/06_15:58:01 debug: on_msg_perform_op: add an operation > operation stop[21] on ocf::apache::service_apache for client 5700, its > parameters: > CRM_meta_timeout=[2] crm_feature_set=[2.0] to the operation list. > lrmd[5697]: 2009/08/06_15:58:01 info: rsc:service_apache: stop > crmd[5700]: 2009/08/06_15:58:01 debug: do_lrm_rsc_op: Recording pending > op: 21 - service_apache_stop_0 service_apache:21 > apache[6782]: 2009/08/06_15:58:02 INFO: Killing apache PID 6712 > apache[6782]: 2009/08/06_15:58:02 INFO: apache stopped. > lrmd[5697]: 2009/08/06_15:58:02 info: Managed service_apache:stop process > 6782 exited with return code 0. > crmd[5700]: 2009/08/06_15:58:02 info: process_lrm_event: LRM operation > service_apache_stop_0 (call=21, rc=0) complete > crmd[5700]: 2009/08/06_15:58:02 debug: do_update_resource: Sent resource > state update message: 34 > crmd[5700]: 2009/08/06_15:58:02 debug: process_lrm_event: Op > service_apache_stop_0 (call=21): Confirmed > > > > > any suggestions are welcome > > Kind Regards > > SST > > > -- > Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate > für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02 > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Pacemaker 1.4 & HBv2 1.99 // About quorum choice
Hi, ok but do you agree that in case of heartbeat network problem, there will be a "race to stonith" from all nodes in the cluster and so the risk that both nodes will be killed is not zero ? That's why I thought that a ping towards an equipment out of the cluster should reduce the risk of split brain : suppose that each node pings its Eth switch (each node connected to a different Eth switch) , and suppose that there is a network problem on one side only, the node which has problem will not ping and will suicide itself, whereas the node which will ping the Eth switch will not suicide and will stonith the other one. Do you agree with this "theory" ? Thanks Alain > And how should we proceed to avoid split-brain cases in a two-nodes > > cluster in case > > of problems on heartbeat network ? > > > make "network" "networks" (plural) to reduce the chance of getting into > a split-brain sitatuation and get and configure stonith devices to > protect your data in case it happens anyways. > > Regards > Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] nodes from 1 cluster appearing in another cl uster
Am Freitag, 7. August 2009 10:28:52 schrieb Yan Gao: > >>>On 8/6/2009 at 8:46 PM, Bernie Wu wrote: > > > > Hi Listers, > > We are running heartbeat 2.1.3-0.9 under zVM 5.4 / SLES10-SP2. > > We have 2 test clusters, one with 3 nodes and the other with 2 nodes. > > > > How do I prevent the nodes from one cluster showing up in the other > > cluster > > > and vice versa ? > > Here is my ha.cf for both clusters: > > Cluster 1: > > use_logd on > > autojoin other > > node lnxhat1 lnxhat2 > > bcast hsi0 > > crm on > > ping 172.22.4.1 172.31.100.31 172.31.100.32 > > respawn root /usr/lib64/heartbeat/pingd -m 2000 -d 5s -a my_ping_set > > > > Cluster2: > > use_logd on > > autojoin other > > node lnodbbt lnodbct > > bcast hsi0 > > crm on > > Different udports, different authkeys or not autojoin. different portds are OK. If you just use different keys you will get a LOT of "auth failed" entries in the log file. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] nodes from 1 cluster appearing in another cluster
>>>On 8/6/2009 at 8:46 PM, Bernie Wu wrote: > Hi Listers, > We are running heartbeat 2.1.3-0.9 under zVM 5.4 / SLES10-SP2. > We have 2 test clusters, one with 3 nodes and the other with 2 nodes. > How do I prevent the nodes from one cluster showing up in the other cluster > and vice versa ? > Here is my ha.cf for both clusters: > Cluster 1: > use_logd on > autojoin other > node lnxhat1 lnxhat2 > bcast hsi0 > crm on > ping 172.22.4.1 172.31.100.31 172.31.100.32 > respawn root /usr/lib64/heartbeat/pingd -m 2000 -d 5s -a my_ping_set > Cluster2: > use_logd on > autojoin other > node lnodbbt lnodbct > bcast hsi0 > crm on Different udports, different authkeys or not autojoin. Regards, Yan Gao China R&D Software Engineer y...@novell.com Novell, Inc. Making IT Work As One™ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Pacemaker 1.4 & HBv2 1.99 // About quorum choice (contd.)
Alain.Moulle wrote: > Hello Andrew, > Could you explain why this functionnality is no more available > (configuration > lines remain in ha.cf) ? ipfail was replaced by pingd in v2. That was in the very first version of v2 afaik. > And how should we proceed to avoid split-brain cases in a two-nodes > cluster in case > of problems on heartbeat network ? make "network" "networks" (plural) to reduce the chance of getting into a split-brain sitatuation and get and configure stonith devices to protect your data in case it happens anyways. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Flooded Heartbeat log (logfile)
Hi, my Cluster is running fine but my logfile is being flooded by these messages: lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: tomcat38-www2test_monitor_0 on www2test returned 0 (ok) instead of the expected value: 7 (not running) lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation tomcat38-www2test_monitor_0 found resource tomcat38-www2test active on www2test lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: tomcat38j-www2test_monitor_0 on www2test returned 0 (ok) instead of the expected value: 7 (not running) lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation tomcat38j-www2test_monitor_0 found resource tomcat38j-www2test active on www2test lha-snmpagent[5252]: 2009/08/07_09:38:37 info: unpack_rsc_op: tomcat37-www2test_monitor_0 on www2test returned 0 (ok) instead of the expected value: 7 (not running) lha-snmpagent[5252]: 2009/08/07_09:38:37 notice: unpack_rsc_op: Operation tomcat37-www2test_monitor_0 found resource tomcat37-www2test active on www2test lha-snmpagent[5252]: 2009/08/07_09:38:37 ERROR: crm_int_helper: Characters left over after parsing 'INFINITY': 'INFINITY' Why does the cluster expect all resources to not be running? I have configured an asymmetrical Opt-In Clusters and this is what my setup looks like: Resource Group: IP_and_Apache IPaddr (ocf::heartbeat:IPaddr):Started www2test apache2 (ocf::cr:apache): Started www2test tomcat21-www1test (ocf::cr:tomcat): Started www1test tomcat22-www1test (ocf::cr:tomcat): Started www1test tomcat22sdb-www1test(ocf::cr:tomcat): Started www1test tomcat30-www1test (ocf::cr:tomcat): Started www1test tomcat34-www1test (ocf::cr:tomcat): Started www1test tomcat35-www1test (ocf::cr:tomcat): Started www1test tomcat36-www1test (ocf::cr:tomcat): Started www1test tomcat37-www1test (ocf::cr:tomcat): Started www1test tomcat38-www1test (ocf::cr:tomcat): Started www1test tomcat38j-www1test (ocf::cr:tomcat): Started www1test tomcat21-www2test (ocf::cr:tomcat): Started www2test tomcat22-www2test (ocf::cr:tomcat): Started www2test tomcat22sdb-www2test(ocf::cr:tomcat): Started www2test tomcat30-www2test (ocf::cr:tomcat): Started www2test tomcat34-www2test (ocf::cr:tomcat): Started www2test tomcat35-www2test (ocf::cr:tomcat): Started www2test tomcat36-www2test (ocf::cr:tomcat): Started www2test tomcat37-www2test (ocf::cr:tomcat): Started www2test tomcat38-www2test (ocf::cr:tomcat): Started www2test tomcat38j-www2test (ocf::cr:tomcat): Started www2test I have these contraints for each resource: and Thanks Kolja Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt Amtsgericht Köln HRB 32356 Steuer-Nr.: 217/5717/0536 Ust.Id.-Nr.: DE 204051920 -- This email transmission and any documents, files or previous email messages attached to it may contain information that is confidential or legally privileged. If you are not the intended recipient or a person responsible for delivering this transmission to the intended recipient, you are hereby notified that any disclosure, copying, printing, distribution or use of this transmission is strictly prohibited. If you have received this transmission in error, please immediately notify the sender by telephone or return email and delete the original transmission and its attachments without reading or saving in any manner. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to remove nodes with hb_gui
>>>On 8/6/2009 at 8:21 PM, Bernie Wu wrote: > Thanks Yan Gao for the reply. We're using heartbeat 2.1.3-0.9 running under > zVM 5.4 / SLES10-SP2. So I guess I have to use cibadmin. So here goes: > > 1. # cibadmin -Q | grep node > value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> > > type="normal"/> > type="normal"/> > type="normal"/> > type="normal"/> > type="normal"/> > > crmd="online" crm-debug-origin="do_lrm_query" shutdown="0" in_ccm="true" > ha="active" join="member" expected="member"> > > ha="active" crm-debug-origin="do_update_resource" crmd="online" shutdown="0" > in_ccm="true" join="member" expected="member"> > > > 2. I then run : > cibadmin -D -o nodes -X ' uname="lnodbbt" type="normal"/>' > Call cib_delete failed (-42): Write requires quorum > Hmm, that modification does need quorum for heartbeat-2.1. Please try "-f" option. > > 3. What now ? The node that I am trying to delete belongs to another > cluster. > > TIA > > On Wed, 2009-08-05 at 20:42 -0400, Bernie Wu wrote: > > Hi Listers, > > How can I remove nodes that currently appear in my Linux HA Management > Client ? > If it's heartbeat based cluster, first you should run hb_delnode to > delete the nodes. > > And then delete them from cib: > If you are using the latest cluster stack, you could either delete them > via the GUI if you have pacemaker-mgmt installed, Or run "crm node > delete ...". > If you are still using heartbeat-2.1, you have to run cibadmin to delete > them. > > > These nodes belong to another cluster and they appear as stopped. > > > > TIA > > Bernie > > > > > > The information contained in this e-mail message is intended only for the > personal and confidential use of the recipient(s) named above. This message > may be an attorney-client communication and/or work product and as such is > privileged and confidential. If the reader of this message is not the > intended recipient or an agent responsible for delivering it to the intended > recipient, you are hereby notified that you have received this document in > error and that any review, dissemination, distribution, or copying of this > message is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail, and delete the original > message. > > ___ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > -- > Regards, > Yan Gao > China R&D Software Engineer > y...@novell.com > > Novell, Inc. > Making IT Work As One(tm) > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > The information contained in this e-mail message is intended only for the > personal and confidential use of the recipient(s) named above. This message > may be an attorney-client communication and/or work product and as such is > privileged and confidential. If the reader of this message is not the > intended recipient or an agent responsible for delivering it to the intended > recipient, you are hereby notified that you have received this document in > error and that any review, dissemination, distribution, or copying of this > message is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail, and delete the original > message. > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > Regards, Yan Gao China R&D Software Engineer y...@novell.com Novell, Inc. Making IT Work As One™ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems