Re: [Linux-HA] pacemaker questions of 4 cases
On Fri, Jan 15, 2010 at 7:23 AM, 梁景明 wrote: > hi ,there are 4 cases in my application to use pacemaker > case 1: one tomcat unexpected down ,restart it by pacemaker. > case 2: one tomcat served machine unexpected down ,fail over to another > machine,back if it recover. > case 3: some tomcat only run on some special nodes ,the others cant monitor > it. > case 4: one server application runs only after some application started , it > means runs by order. > > first i try to exam case 1. > i built a 4 nodes environment to test it ,and standby three nodes .like this > > > Last updated: Fri Jan 15 11:57:49 2010 > Stack: openais > Current DC: bak1 - partition with quorum > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56 > 4 Nodes configured, 4 expected votes > 1 Resources configured. > > > Node bak1: standby > Node test1: standby > Node test2: standby > Online: [ ubuntu ] > > and tomcat lsb script i use the example from the doc on wiki. it started on > node ubuntu like this "*sudo sh /etc/init.d/tomcatpace start*" no problem. > crm configure : > > node bak1 \ > attributes standby="on" > node test1 \ > attributes standby="on" > node test2 \ > attributes standby="on" > node ubuntu > primitive tomcat lsb:tomcatpace \ > op monitor interval="10" timeout="30s" \ > meta migration-threshold="10" target-role="Started" > > first i think only ubuntu is online ,so the script only run on ubuntu ,is it > right? > but it fails .it seems to be all the nodes running the script. > > Node bak1: standby > Node test1: standby > Node test2: standby > Online: [ ubuntu ] > > tomcat (lsb:tomcatpace) Started [ bak1 test1 test2 ] > > Failed actions: > tomcat_monitor_0 (node=bak1, call=2, rc=254, status=complete): > tomcat_stop_0 (node=bak1, call=3, rc=254, status=complete): > tomcat_monitor_0 (node=test1, call=2, rc=254, status=complete): > > tomcat_stop_0 (node=test1, call=3, rc=254, status=complete): > tomcat_monitor_0 (node=test2, call=2, rc=254, status=complete): > > tomcat_stop_0 (node=test2, call=3, rc=254, status=complete): Have a look at: http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active In your case, the failed actions indicate the script is not LSB compliant: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html First thing to do before trying anything else is to fix the script. > > then i added location rule ,but i am not sure about the usage of it ,so i > followed the example . > > location prefer-ubuntu tomcat \ > rule $id="prefer-rule" 100: #uname eq ubuntu > > is the line to prefer ubuntu node ,and only run on that node ? current > configure : > > node bak1 \ > attributes standby="on" > node test1 \ > attributes standby="on" > node test2 \ > attributes standby="on" > node ubuntu > primitive tomcat lsb:tomcatpace \ > op monitor interval="10" timeout="30s" \ > meta migration-threshold="10" target-role="Started" > location prefer-ubuntu tomcat \ > rule $id="prefer-rule" 100: #uname eq ubuntu > > but it fails again > > Node bak1: standby > Node test1: standby > Node test2: standby > Online: [ ubuntu ] > > tomcat (lsb:tomcatpace) Started [ bak1 test1 test2 ] > > Failed actions: > tomcat_monitor_0 (node=bak1, call=2, rc=254, status=complete): > tomcat_stop_0 (node=bak1, call=3, rc=254, status=complete): > tomcat_monitor_0 (node=test1, call=2, rc=254, status=complete): > tomcat_stop_0 (node=test1, call=3, rc=254, status=complete): > tomcat_monitor_0 (node=test2, call=2, rc=254, status=complete): > ) > > thanks for any help . > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Colocation of 2 resources so that it can't run together
On Fri, Jan 15, 2010 at 8:39 PM, jaspal singla wrote: > Hello, > > Thanks for your prompt response and also I have some doubts, it will be if > these also will get cleared. > > Please find my inline queries: > > >> > group group_vz_1 vip_ipaddr2 filesystem1_Filesystem vz1_script \ >> > meta target-role="started" >> > group group_vz_2 vip2_ipaddr2 filesystem2_Filesystem vz2_script \ >> > meta target-role="started" >> > location location_master group_vz_1 700: node_master >> > location location_node3 group_vz_2 600: node3 >> > location location_slave_1 group_vz_1 0: node_slave >> > location location_slave_2 group_vz_2 0: node_slave >> > colocation colocation_vz_test -inf: group_vz_1 group_vz_2 >> >> The anti-collocation rule you have is correct, and this should result in >> the resources not being placed on the same node. >> > > Yes this configuration is working fine now as per my required scenario after > restart of the node_slave node but I don't know why after restart of the > node_slave the behavior became alright.. Did you attach a hb_report from before you rebooted node_slave? >> Disabling stonith is not a good idea if you're running shared storage. >> >> > For stonith integration, unfortunately I am using old hardware servers where > there is not any ILO such kind of ports and my management also don't want to > invest money for the APC Power switch.. > > Please suggest me Can I go for SSH based stonith mechanism for my Production > setup or better to leave my Production cluster without stonith? Neither is an option I would consider appropriate for a production cluster. > >> rsc_defaults $id="rsc_defaults-options" >> >> You may want to enable resource-stickiness to avoid resources shuffling >> around needlessly. >> >> > What is the use of default_resource-stickness value(If in case of we have > defined resource-stickiness value to all primitive resources)?? As I have > already defined the resource_stickiness value to my all primitive > resources.. If you have already defined it for every resource, then there is no need to supply a default as well. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Openais] Problem with cluster linux HA
On Mon, Jan 18, 2010 at 2:46 PM, Galera, Daniel wrote: > Hell all, > > I have 2 Suse Linux Enterprise 11 Servers with High Av. Extension. I'm > configuring a cluster with 2 nodes for the cluster and only 1 group to run. > I use SBD as STONITH. I set the cluster correctly without problems. Now i > want to have an application clustered named HPOS. for that i need to have in > the group: SFEX --:> to lock the drive LVM --> to activate the VG Filesystem > --> to mount the 3 filesystems needed IP --> to bring online IP of the > cluster and then two LSB to run the 2 processes of application HPOS. anyway, > the application is not the problem. The problem is that when i want to test > cluster and for example MOVE resource to the other node (server1)... the > group becomes down and server2 appears as offline with Stonith UNCLEAN. Usually its when a resource fails to stop. Please use hb_report to generate a tarball and indicate which node you tried to move the resource to (and how) > that > info checking from server1 if at that moment i check crm_mon from server2, i > see server2 as online but server1 down. No idea what the problem is. > > Attached you the cluster XML config file. > > Attached the log files of 2 nodes when i executed the MOVE RESOURCE that > failed. > > am i missing any resource location or any other expected thing? > > do you have any cluster example so i can configure correctluy mine? > > regards > > Dani > > ___ > Openais mailing list > open...@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/openais > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] messages from existing hearbeat on the same lan
On Tue, Jan 19, 2010 at 1:15 PM, Dominik Klein wrote: > Aclhk Aclhk wrote: >> On the same lan, there are already two heartbeat node 136pri and 137sec. >> >> I setup another 2 nodes with heartbeat. they keep receiving the following >> messages: >> >> heartbeat[9931]: 2010/01/19_10:53:01 WARN: string2msg_ll: node [136pri] >> failed authentication >> heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] >> in message! >> heartbeat[9931]: 2010/01/19_10:53:02 WARN: string2msg_ll: node [137sec] >> failed authentication >> heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] >> in message! >> >> ha.cf >> debugfile /var/log/ha-debug >> logfile /var/log/ha-log >> logfacility local0 >> bcast eth0 >> keepalive 5 >> warntime 10 >> deadtime 120 >> initdead 120 >> auto_failback off >> node 140openfiler1 >> node 141openfiler2 >> >> bcast for all nodes are same, that is eth0 >> >> pls advise how to avoid the messages. > > Use mcast or ucast instead of bcast? Or change the port ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] messages from existing hearbeat on the same lan
Aclhk Aclhk wrote: > On the same lan, there are already two heartbeat node 136pri and 137sec. > > I setup another 2 nodes with heartbeat. they keep receiving the following > messages: > > heartbeat[9931]: 2010/01/19_10:53:01 WARN: string2msg_ll: node [136pri] > failed authentication > heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] in > message! > heartbeat[9931]: 2010/01/19_10:53:02 WARN: string2msg_ll: node [137sec] > failed authentication > heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] in > message! > > ha.cf > debugfile /var/log/ha-debug > logfile /var/log/ha-log > logfacility local0 > bcast eth0 > keepalive 5 > warntime 10 > deadtime 120 > initdead 120 > auto_failback off > node 140openfiler1 > node 141openfiler2 > > bcast for all nodes are same, that is eth0 > > pls advise how to avoid the messages. Use mcast or ucast instead of bcast? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] messages from existing hearbeat on the same lan
On the same lan, there are already two heartbeat node 136pri and 137sec. I setup another 2 nodes with heartbeat. they keep receiving the following messages: heartbeat[9931]: 2010/01/19_10:53:01 WARN: string2msg_ll: node [136pri] failed authentication heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] in message! heartbeat[9931]: 2010/01/19_10:53:02 WARN: string2msg_ll: node [137sec] failed authentication heartbeat[9931]: 2010/01/19_10:53:02 WARN: Invalid authentication type [1] in message! ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 bcast eth0 keepalive 5 warntime 10 deadtime 120 initdead 120 auto_failback off node 140openfiler1 node 141openfiler2 bcast for all nodes are same, that is eth0 pls advise how to avoid the messages. Yahoo!香港提供網上安全攻略,教你如何防範黑客! 請前往 http://hk.promo.yahoo.com/security/ 了解更多! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems