Re: [Pacemaker] Unable to stop Multi state resource
On Tue, Apr 19, 2011 at 12:34 PM, Rakesh K wrote: > Rakesh K writes: > > > Hi Andrew > > FSR is a File system replication script which adheres to ocf cluster frame > work, > the script is similar to Mysql ocf script, which is a multi state resource, > where in master ssh server would be running and in slave there are rsync > scripts which uses to synchronize the data between the Master and slave. > > the rsync script will be having the Master FSR location, so that the rysnc > tool > will be frequently replication the data from the FSR master location. > > here is the crm configuration show output Thanks, but this doesn't really answer my question about whether the cluster tried to stop it. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources
On Tue, Apr 19, 2011 at 12:40 PM, Rakesh K wrote: > Andrew Beekhof writes: > >> >> There is nothing in this config that requires tomcat2 to be stopped. >> >> Perhaps: >> colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP >> was intended to be: >> colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1 >> >> The only other service active is httpd, which also has no constraints >> indicating it should stop when mysql is down. >> > > Thanks Andrew for the valuable feed back. > > As mentioned i had changed the colocation constraint but still facing with the > same issue. > > As per the order given in HA configuration, i am providing output of my crm > configure show command Not enough sorry, I need the status section too. crm configure show xml ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm : unknown expected votes
On Tue, Apr 19, 2011 at 3:37 PM, wrote: > Hi, > > > > I created a 2 node cluster created using pacemaker on Fedora > 14(2.6.35.6-45.fc14.x86_64) > > I have two errors that I am not able to resolve. > > Can someone help me resolve these errors. > > > > 1 ) It always shows “ unknown expected votes” when I see ‘crm status’. not an error. heartbeat based clusters do not use this > > 2 ) In the logfile it shows below message even though stonith setting is > not enabled. > > Error: te_connect_stonith: Attempting connection to fencing daemon… disabling stonith does not impact whether the daemon is started nor whether we connect to it google is your friend: http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg16967.html > > > > > Pasted below the configure and status: > > === > > -bash-4.1# crm configure show > > node $id="2e9dd3fa-8083-4363-96b4-331aa9b93d1f" rabbithanode2 > > node $id="3a56dae9-d8c7-46b0-8a86-f6bd3b9658f4" rabbithanode1 > > primitive bunny ocf:rabbitmq:rabbitmq-server \ > > params mnesia_base="/cluster1" > > primitive drbd ocf:linbit:drbd \ > > params drbd_resource="wwwdata" \ > > op monitor interval="60s" > > primitive drbd_fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd1" directory="/cluster1" fstype="ext4" > > ms drbd_ms drbd \ > > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > colocation bunny_on_fs inf: bunny drbd_fs > > colocation fs_on_drbd inf: drbd_fs drbd_ms:Master > > order bunny_after_fs inf: drbd_fs bunny > > order fs_after_drbd inf: drbd_ms:promote drbd_fs:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065" \ > > cluster-infrastructure="Heartbeat" \ > > stonith-enabled="false" \ > > resource-stickiness="100" \ > > no-quorum-policy="ignore" > > === > > > > -bash-4.1# crm status > > > > Last updated: Tue Apr 19 09:32:52 2011 > > Stack: Heartbeat > > Current DC: rabbithanode2 (2e9dd3fa-8083-4363-96b4-331aa9b93d1f) - partition > with quorum > > Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065 > > 2 Nodes configured, unknown expected votes > > 3 Resources configured. > > > > > > Online: [ rabbithanode1 rabbithanode2 ] > > > > Master/Slave Set: drbd_ms [drbd] > > Masters: [ rabbithanode1 ] > > Slaves: [ rabbithanode2 ] > > drbd_fs (ocf::heartbeat:Filesystem): Started rabbithanode1 > > bunny (ocf::rabbitmq:rabbitmq-server): Started rabbithanode1 > > -bash-4.1# > > === > > > > Thanks & Regds > > Hari Tatituri > > > > TACG-Cloud Factory Mobilization > > Desk : +91-080-43154146 > > Mobile: +91-9686022660 > > > > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the email by you is prohibited. > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Question of the syslog output in pacemaker-1.1
Hi, Andrew (2011/04/19 18:13), Andrew Beekhof wrote: On Tue, Apr 19, 2011 at 9:25 AM, Yuusuke IIDA wrote: Hi, Andrew I use corosync-1.3.0 and Pacemaker-1.1.5. The log outputs it via rsyslog. I changed syslog_facility of corosync.conf to local1 and was going to let a designated file output the log of the cluster. However, setting was not reflected for a process performed of fork by pacemakerd. Ah, I see the problem. The following patch will be in devel shortly diff -r 7225f68ae6e9 mcp/pacemaker.c --- a/mcp/pacemaker.c Mon Apr 18 16:52:22 2011 +0200 +++ b/mcp/pacemaker.c Tue Apr 19 11:12:09 2011 +0200 @@ -692,7 +692,7 @@ main(int argc, char **argv) crm_make_daemon(crm_system_name, TRUE, pid_file); /* Only Re-init if we're running daemonized */ - crm_log_init_quiet(NULL, LOG_INFO, TRUE, FALSE, argc, argv); + crm_log_init(NULL, LOG_INFO, TRUE, FALSE, 0, NULL); } crm_info("Starting Pacemaker %s (Build: %s): %s\n", VERSION, BUILD_VERSION, CRM_FEATURES); I confirmed that a problem was solved by this correction. http://hg.clusterlabs.org/pacemaker/devel/rev/b162c2b84e16 Give a quick response, and thank you. Yuusuke facility of a process performed of fork by pacemakerd remained daemon. It was only corosync and pacemakerd that the setting that I changed became effective. Why is setting of syslog_facility of corosync.conf ineffective in a process performed of fork by pacemakerd? Please teach a method to change facility of a process performed of fork by pacemakerd. Best Regards, Yuusuke IIDA -- METRO SYSTEMS CO., LTD Yuusuke Iida Mail: iiday...@intellilink.co.jp ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- METRO SYSTEMS CO., LTD Yuusuke Iida Mail: iiday...@intellilink.co.jp ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Resources won't start
>>Did it start? No, here is the output, all resources kind of went away. Thats what I've been fighting all day.. Last updated: Tue Apr 19 13:52:18 2011 Stack: openais Current DC: CentClus2 - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ CentClus1 CentClus2 ] Thats it! Config: node CentClus1 node CentClus2 primitive FS_disk ocf:heartbeat:Filesystem \ params device="/dev/VolGroup01/Shared1" directory="/data" fstype="gfs" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.1.90" cidr_netmask="32" \ op monitor interval="30s" primitive ISCSI_disk ocf:heartbeat:iscsi \ params portal="192.168.1.79:3260" \ target="iqn.1991-05.com.microsoft:wss-w2xxxvr-wss-target3-target" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="120s" timeout="30s" primitive VG_disk ocf:heartbeat:LVM \ params volgrpname="VolGroup01" exclusive="yes" \ op monitor interval="10" timeout="30" on-fail="restart" depth="0" \ op start interval="0" timeout="30" \ op stop interval="0" timeout="30" primitive IP_ping ocf:heartbeat:IPaddr2 \ params ip="192.168.1.90" cidr_netmask="32" \ op monitor interval="30s" primitive PM_ping ocf:pacemaker:ping \ params name="p_ping" host_list="192.168.1.91 192.168.1.92 192.168.1.1 " \ op monitor interval="15s" timeout="30s" property $id="cib-bootstrap-options" \ dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping IP_ping location Loc_ping CL_group \ rule $id="loc_ping-rule" -inf: not_defined PM_ping or PM_ping lte 0 rsc_defaults $id="rsc-options" \ resource-stickiness="1000" PHIL HUNT AMS Consultant phil.h...@orionhealth.com P: +1 857 488 4749 M: +1 508 654 7371 S: philhu0724 www.orionhealth.com - Original Message - From: "mark - pacemaker list" To: "The Pacemaker cluster resource manager" Sent: Tuesday, April 19, 2011 5:05:16 PM GMT -05:00 US/Canada Eastern Subject: Re: [Pacemaker] Resources won't start Hi Phil, On Tue, Apr 19, 2011 at 3:36 PM, Phil Hunt wrote: > Hi > I have iscsid running, no iscsi. Good. You don't want the system to auto-connect the iSCSI disks on boot, pacemaker will do that for you. > > > > Here is the crm status: > > Last updated: Tue Apr 19 12:39:03 2011 > Stack: openais > Current DC: CentClus2 - partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 1 Resources configured. > > > Online: [ CentClus1 CentClus2 ] > > Resource Group: CL_group > ClusterIP (ocf::heartbeat:IPaddr2): Started CentClus2 > FS_disk (ocf::heartbeat:Filesystem): Stopped > ISCSI_disk (ocf::heartbeat:iscsi): Stopped > VG_disk (ocf::heartbeat:LVM): Stopped > PM_ping (ocf::pacemaker:ping): Stopped > IP_ping (ocf::heartbeat:IPaddr2): Stopped > > The resources are listed in top-down start order. So you're starting ClusterIP, but then try to start the filesystem when you still haven't connected to the iSCSI disk or started the volume group. > > Here is the crm config: > node CentClus1 > group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping > order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk You have conflicting orders, there. A resource group defines an order, so the other order statement seems unnecessary. If you remove that order constraint, and change your group line so that things start in the correct order, does it come up? group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping I left off IP_ping, because it's exactly the same as ClusterIP. Was it meant to be something else? Regards, Mark ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Resources won't start
Hi Phil, On Tue, Apr 19, 2011 at 3:36 PM, Phil Hunt wrote: > Hi > I have iscsid running, no iscsi. Good. You don't want the system to auto-connect the iSCSI disks on boot, pacemaker will do that for you. > > > > Here is the crm status: > > Last updated: Tue Apr 19 12:39:03 2011 > Stack: openais > Current DC: CentClus2 - partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 1 Resources configured. > > > Online: [ CentClus1 CentClus2 ] > > Resource Group: CL_group > ClusterIP (ocf::heartbeat:IPaddr2): Started CentClus2 > FS_disk (ocf::heartbeat:Filesystem): Stopped > ISCSI_disk (ocf::heartbeat:iscsi): Stopped > VG_disk (ocf::heartbeat:LVM): Stopped > PM_ping (ocf::pacemaker:ping): Stopped > IP_ping (ocf::heartbeat:IPaddr2): Stopped > > The resources are listed in top-down start order. So you're starting ClusterIP, but then try to start the filesystem when you still haven't connected to the iSCSI disk or started the volume group. > > Here is the crm config: > node CentClus1 > group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping > order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk You have conflicting orders, there. A resource group defines an order, so the other order statement seems unnecessary. If you remove that order constraint, and change your group line so that things start in the correct order, does it come up? group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping I left off IP_ping, because it's exactly the same as ClusterIP. Was it meant to be something else? Regards, Mark ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Resources won't start
Hi I've been having alot of problems figuring out a problem. In the enclosed config for a 2 node cluster, letting 2 RHEL5 boxes work as a cluster with a shared iSCSI disk stored on a Windows Storage Server box, the resources will not start. I have iscsid running, no iscsi. I was modifying because the iscsi would connect to the target, the lvm would work but the disk would not mount. If I mounted it manually and did a resource cleanup on the FS resource, it said it was fine. Something I did really messed it all up. I am very new at this, so anything with this problem in my config would be appreciated. The attempt is to have the 2 systems share a iscsi disk, all resources running in one group with a VIP address and the disk moving to the other server on failure. Here is the crm status: Last updated: Tue Apr 19 12:39:03 2011 Stack: openais Current DC: CentClus2 - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ CentClus1 CentClus2 ] Resource Group: CL_group ClusterIP (ocf::heartbeat:IPaddr2): Started CentClus2 FS_disk(ocf::heartbeat:Filesystem):Stopped ISCSI_disk (ocf::heartbeat:iscsi): Stopped VG_disk(ocf::heartbeat:LVM): Stopped PM_ping(ocf::pacemaker:ping): Stopped IP_ping(ocf::heartbeat:IPaddr2): Stopped Here is the crm config: node CentClus1 node CentClus2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.1.90" cidr_netmask="32" \ op monitor interval="30s" primitive FS_disk ocf:heartbeat:Filesystem \ params device="/dev/VolGroup01/Shared1" directory="/data" fstype="gfs" primitive IP_ping ocf:heartbeat:IPaddr2 \ params ip="192.168.1.90" cidr_netmask="32" \ op monitor interval="30s" primitive ISCSI_disk ocf:heartbeat:iscsi \ params portal="192.168.1.79:3260" target="iqn.1991-05.com.microsoft:wss-w2xxsvr-wss-target3-target" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="120s" timeout="30s" primitive PM_ping ocf:pacemaker:ping \ params name="p_ping" host_list="192.168.1.91 192.168.1.92 192.168.1.1 " \ op monitor interval="15s" timeout="30s" primitive VG_disk ocf:heartbeat:LVM \ params volgrpname="VolGroup01" exclusive="yes" \ op monitor interval="10" timeout="30" on-fail="restart" depth="0" \ op start interval="0" timeout="30" \ op stop interval="0" timeout="30" group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping location Loc_ping CL_group \ rule $id="loc_ping-rule" -inf: not_defined PM_ping or PM_ping lte 0 order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk property $id="cib-bootstrap-options" \ dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" PHIL HUNT AMS Consultant phil.h...@orionhealth.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'
On 04/19/2011 10:38 AM, Marek Marczykowski wrote: >> in your opintion, is it possible to fix this via the ocf ra or does it >> have to be a separate cronjob? > > I haven't idea how to do it in ra. There is no easy way to look what > binlogs are on the other node. Maybe some tricks storing that info on > monitor action, but this is ugly and makes ra depending on monitor > action enabled... > The easiest solutions are the best :) what about submitting a "show master logs" query to the to-be master, checking the available logs and refusing to start if the log-file disappeared? cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Resource Agents 1.0.4: HA LVM Patch
Hi, I attached a patch to enhance the LVM agent with the capability to set a tag on the VG (set_hosttag = true) in conjunction with a volume_list filter this can prevent to activate a VG on multiple host. Unfortunately active VGs will stay active in case of unclean operation. The tag is always the hostname. Some configuration hints can be found here: http://sources.redhat.com/cluster/wiki/LVMFailover Cheers, Ulf -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl LVM.patch Description: Binary data ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] crm : unknown expected votes
Hi, I created a 2 node cluster created using pacemaker on Fedora 14(2.6.35.6-45.fc14.x86_64) I have two errors that I am not able to resolve. Can someone help me resolve these errors. 1 ) It always shows " unknown expected votes" when I see 'crm status'. 2 ) In the logfile it shows below message even though stonith setting is not enabled. Error: te_connect_stonith: Attempting connection to fencing daemon... Pasted below the configure and status: === -bash-4.1# crm configure show node $id="2e9dd3fa-8083-4363-96b4-331aa9b93d1f" rabbithanode2 node $id="3a56dae9-d8c7-46b0-8a86-f6bd3b9658f4" rabbithanode1 primitive bunny ocf:rabbitmq:rabbitmq-server \ params mnesia_base="/cluster1" primitive drbd ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive drbd_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/cluster1" fstype="ext4" ms drbd_ms drbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation bunny_on_fs inf: bunny drbd_fs colocation fs_on_drbd inf: drbd_fs drbd_ms:Master order bunny_after_fs inf: drbd_fs bunny order fs_after_drbd inf: drbd_ms:promote drbd_fs:start property $id="cib-bootstrap-options" \ dc-version="1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ resource-stickiness="100" \ no-quorum-policy="ignore" === -bash-4.1# crm status Last updated: Tue Apr 19 09:32:52 2011 Stack: Heartbeat Current DC: rabbithanode2 (2e9dd3fa-8083-4363-96b4-331aa9b93d1f) - partition with quorum Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065 2 Nodes configured, unknown expected votes 3 Resources configured. Online: [ rabbithanode1 rabbithanode2 ] Master/Slave Set: drbd_ms [drbd] Masters: [ rabbithanode1 ] Slaves: [ rabbithanode2 ] drbd_fs(ocf::heartbeat:Filesystem):Started rabbithanode1 bunny (ocf::rabbitmq:rabbitmq-server):Started rabbithanode1 -bash-4.1# === Thanks & Regds Hari Tatituri TACG-Cloud Factory Mobilization Desk : +91-080-43154146 Mobile: +91-9686022660 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] how to get pacemaker:ping recheck before promoting drbd resources on a node
On Tue, Apr 19, 2011 at 11:54 AM, Jelle de Jong wrote: > On 19-04-11 11:31, Andrew Beekhof wrote: >> It the underlying messaging/membership layer goes into spasms - >> there's not much ping can do to help you. What version of corosync >> have you got? Some versions have been better than others. > > corosync 1.2.1-4 > pacemaker 1.0.9.1+hg15626-1 > /etc/debian_version 6.0.1 (stable) > >> Correct, its checked periodically. > > Can I change the config that a ping check is done before promoting drbd? No. As I said, you'd need to add this to the agent itself. We just make sure things are in a certain state before starting/promoting other resources - we don't call specific actions. > > I tried adding a seperate ping0: http://pastebin.com/raw.php?i=2WD1HKnC > I thought it worked but ping0 starts and drbd is still promoted probably > because ping0 returns a successful start but does not return an error > because the actual ping failed. So I tried adding additonal location > rules for ping0 but then the resources is not started at anymore: > http://pastebin.com/raw.php?i=DXqRzMNs > >> That is something that would be needed to be added to the drbd >> agent. Alternatively, configure the ping resource to update more >> frequently. > > How can this be done? crm ra info ocf:ping doesn't show much info. I > tried using attempts="1" dampen="1" timeout="1" and monitor > interval="1". An example how to do frequent fast ping would be welcome. a monitor with interval=1, timeout=1 and dampen=0 should give the closest behavior to what you're after. make sure interval is not a parameter though. > > If I cam make the ping check fast enough to detect network failures > before corosync tell pacemaker the other node disappears/failed this may > provide a workaround solution. > >> But you did loose the node. The cluster can't see into the future to >> know that it will come back in a bit. What token timeouts are you >> using? > > True, but the node should see his own network is down and see he is the > one that was failing and wait until his network is back and check his > situation again before doing things with his resources. The cluster does not understand the network topology in the way you do > My corosync.conf with token 3000: http://pastebin.com/Y5Lkf4Ch Increasing that will tell the cluster to wait a bit longer before declaring a node dead. > > Thanks in advance, > > Any help is much appreciated, > > Kind regards, > > Jelle de Jong > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] A question and demand to a resource placement strategy function
On 04/18/11 18:17, Yuusuke IIDA wrote: > * When it is not dispersed well > When I produced trouble in a resource in order of next, I am partial, and the > resource is placed in one node. > > main_rsc3 -> main_rsc2 -> main_rsc1 > > Online: [srv-b1 srv-b2 srv-a1] > Full list of resources: > main_rsc1 (ocf::pacemaker:Dummy): Started srv-b1 > main_rsc2 (ocf::pacemaker:Dummy): Started srv-b1 > main_rsc3 (ocf::pacemaker:Dummy): Started srv-b1 > > # crm configure ptest utilization > Utilization information: > Original: srv-b2 capacity: capacity=3 > Original: srv-b1 capacity: capacity=3 > Original: srv-a1 capacity: capacity=3 > calculate_utilization: main_rsc1 utilization on srv-b1: capacity=1 > calculate_utilization: main_rsc2 utilization on srv-b1: capacity=1 > calculate_utilization: main_rsc3 utilization on srv-b1: capacity=1 > Remaining: srv-b2 capacity: capacity=3 > Remaining: srv-b1 capacity: capacity=0 > Remaining: srv-a1 capacity: capacity=3 > > I think that this problem occurs by difference in order of handling of > resource. Exactly. Given the allocation scores are as the following at this time: native_color: main_rsc1 allocation score on srv-a1: -INFINITY native_color: main_rsc1 allocation score on srv-b1: 100 native_color: main_rsc1 allocation score on srv-b2: 100 native_color: main_rsc2 allocation score on srv-a1: -INFINITY native_color: main_rsc2 allocation score on srv-b1: INFINITY native_color: main_rsc2 allocation score on srv-b2: 100 native_color: main_rsc3 allocation score on srv-a1: -INFINITY native_color: main_rsc3 allocation score on srv-b1: INFINITY native_color: main_rsc3 allocation score on srv-b2: 100 And the resources would get assigned from top to bottom. Actually I've been optimizing the placement-strategy lately. It will sort the resource processing order according to the priorities and scores of resources. That should result in ideal placement. Stay tuned. Regards, Yan -- Yan Gao Software Engineer China Server Team, OPS Engineering, Novell, Inc. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker / Postfix startup problem...
I'll get a chance to work on it today. I'll let you know what happens. :) Thanks!! -Original Message- From: Raoul Bhatia [IPAX] [mailto:r.bha...@ipax.at] Sent: Tuesday, April 19, 2011 5:15 AM To: The Pacemaker cluster resource manager Cc: Adam Reiss Subject: Re: [Pacemaker] Pacemaker / Postfix startup problem... adam, any news on this? if this is not working for you, i've got another idea. but please report the current status first... thanks, raoul On 04/14/2011 08:33 PM, Raoul Bhatia [IPAX] wrote: > hi adam, > > On 14.04.2011 18:10, Adam Reiss wrote: >> Hi Raoul, >> >> We're trying to setup a HA SMTP Relay, so having pacemaker stop/start >> the services as it passes the work over to the other machine, should >> Postfix fail... Is there a better way to allow an HA SMTP relay? > > when we're setting up a clustered postfix, we do not mess with the > default /etc/postfix/ config but use a different location on a drbd > backed deviced instead. > > e.g. /data/mail/ > > this way, local mail deliverey (cron output!) works without any issue - > even if the clustered postfix is down (e.g. for maintenance) or simply > migrated to a different host. > >> It's running under VMWare, having two different guests, on two different >> hosts... >> >> I've attached the output you've requested. :) >> >> There is no syslog file in /var/log . > > mhm - your hb_report is incomplete too. i don't know centos - where does > centos' syslog write it's logfiles? > > anyways, i've updated the postfix ocf ra to handle some configuration > cases and errors: > > > https://github.com/raoulbhatia/resource-agents/tree/master/heartbeat/pos tfix > > > depending on your system, you might need to apply the following patch: > > -: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} > -. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs > +: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat} > +. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs > > > could you please give it a shot and report whats happening? > > if it is still *not* working for you, i would need your current > configuration, a new hb_report and the system's logfiles. > > thanks, > raoul > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake r -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Announce: Hawk (HA Web Konsole) 0.4.0
Greetings All, This is to announce version 0.4.0 of Hawk, a web-based GUI for managing and monitoring Pacemaker High-Availability clusters. You can use Hawk 0.4.0 to: - Monitor your cluster, with much the same functionality as crm_mon (displays node and resource status, failed ops). - Perform basic operator tasks: - Node: standby, online, fence - Resource: start, stop, migrate, unmigrate, clean up. - Create, edit and delete primitives, groups, clones, m/s resources. - Edit crm_config properties. Hawk is intended to run on each node in your cluster, and is accessible via HTTPS on port 7630. You can then access it by pointing your web browser at the IP address of any cluster node, or the address of any IPaddr(2) resource you may have configured. You will need to configure a user account to log in as. The same rules apply as for the python GUI; you need to log in as a user in the "haclient" group. Packages for various SUSE-based distros can be obtained from the network:ha-clustering and network:ha-clustering:Factory repos on OBS, or you can just search for Hawk on software.opensuse.org: http://software.opensuse.org/search?baseproject=ALL&q=Hawk I don't have Fedora/Red Hat packages yet, but building an RPM from source is easy: # hg clone http://hg.clusterlabs.org/pacemaker/hawk # cd hawk # hg update tip # make rpm My apologies to non-RPM-based distro users (packaging assistance gladly accepted!) Further information is available at: http://www.clusterlabs.org/wiki/Hawk Please direct comments, feedback, questions, etc. to myself and/or (preferably) the Pacemaker mailing list. Happy clustering, Tim -- Tim Serong Senior Clustering Engineer, OPS Engineering, Novell Inc. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources
Andrew Beekhof writes: > > There is nothing in this config that requires tomcat2 to be stopped. > > Perhaps: >colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP > was intended to be: >colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1 > > The only other service active is httpd, which also has no constraints > indicating it should stop when mysql is down. > Thanks Andrew for the valuable feed back. As mentioned i had changed the colocation constraint but still facing with the same issue. As per the order given in HA configuration, i am providing output of my crm configure show command node $id="6317f856-e57b-4a03-acf1-ca81af4f19ce" cisco-demomsf node $id="87b8b88e-3ded-4e34-8708-46f7afe62935" mysql3 primitive Httpd ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" httpd="/usr/sbin/httpd" c lient="curl" statusurl="http://localhost/img/test.html"; testregex="*" \ op start interval="0" timeout="60s" \ op monitor interval="50s" timeout="50s" \ meta target-role="Started" primitive HttpdVIP ocf:heartbeat:IPaddr3 \ params ip="172.21.52.149" eth_num="eth0:4" vip_cleanup_file="/var/run/bi gha.pid" \ op start interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" primitive Mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/ lib/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/my sql.sock" test_passwd="slavepass" test_table="msfha.conn" test_user="repl" repli cation_user="repl" replication_passwd="slavepass" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10s" role="Master" timeout="8s" \ op monitor interval="12s" timeout="8s" primitive MysqlVIP ocf:heartbeat:IPaddr3 \ params ip="172.21.52.150" eth_num="eth0:3" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" primitive Tomcat1 ocf:msf:tomcat \ params tomcat_name="tomcat" statusurl="http://localhost:8080/dbtest/testtomcat.html"; java_home="/" catalina_home="/home/msf/runtime/tomcat/apache-tomcat-6.0.18" client="curl" testregex="*" \ op start interval="0" timeout="60s" \ op monitor interval="50s" timeout="50s" \ op stop interval="0" \ meta target-role="Started" primitive Tomcat1VIP ocf:heartbeat:IPaddr3 \ params ip="172.21.52.140" eth_num="eth0:2" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" primitive Tomcat2 ocf:msf:tomcat \ params tomcat_name="tomcat" statusurl="http://localhost:8081/"; java_home="/" catalina_home="/home/msf/runtime/tomcat2/apache-tomcat-6.0.18" client="curl" testregex="*" \ op start interval="0" timeout="60s" \ op monitor interval="50s" timeout="50s" \ op stop interval="0" \ meta target-role="Started" primitive Tomcat2VIP ocf:heartbeat:IPaddr3 \ params ip="172.21.52.139" eth_num="eth0:4" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" ms MS_Mysql Mysql \ meta notify="true" target-role="Stopped" location L_Master MS_Mysql \ rule $id="L_Master-rule" $role="Master" 100: #uname eq cisco-demomsf \ rule $id="L_Master-rule1" $role="Master" 100: #uname eq mysql3 colocation Httpd-with-ip inf: HttpdVIP Httpd colocation Mysql-with-ip inf: MysqlVIP MS_Mysql:Master colocation Tomcat1-with-ip inf: Tomcat1VIP Tomcat1 colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1 colocation tomcat2-with-ip inf: Tomcat2VIP Tomcat2 order Httpd-after-Tomcat2 inf: Tomcat2 Httpd order Httpd-after-op inf: HttpdVIP Httpd order Mysql-after-ip inf: MysqlVIP MS_Mysql order Tomcat1-after-MYSQL inf: MS_Mysql Tomcat1VIP order Tomcat1-after-ip inf: Tomcat1VIP Tomcat1 order Tomcat2-after-ip inf: Tomcat2VIP Tomcat2 property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1300787402" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/ma
Re: [Pacemaker] Unable to stop Multi state resource
Rakesh K writes: Hi Andrew FSR is a File system replication script which adheres to ocf cluster frame work, the script is similar to Mysql ocf script, which is a multi state resource, where in master ssh server would be running and in slave there are rsync scripts which uses to synchronize the data between the Master and slave. the rsync script will be having the Master FSR location, so that the rysnc tool will be frequently replication the data from the FSR master location. here is the crm configuration show output node $id="82a5281a-a069-49c1-9f57-d4a8f6eb3d72" prodmsf2 node $id="d8b6c2e7-d1c3-4a15-9411-ed4d710c8672" prodmsf primitive FSR ocf:msf:fsr \ params client_script="/home/msf/ha/scripts/ocf/rsyncClient" source_dir="/home/msf/services/persistence/" dest_dir="/home/msf/services/persistence/" user="root" pid="/var/run/fsr.pid" rsync_binary="/usr/bin/rsync" rsync_options="-az" rsync_interval="1" config_file="/home/msf/ha/config/ocf/fsr.config" status_dump="/home/msf/ha/status/rsync_client_dump" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10s" role="Master" timeout="8s" \ op monitor interval="12s" timeout="8s" primitive Httpd ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" httpd="/usr/sbin/httpd" client="curl" statusurl="http://localhost/img/test.html"; testregex="*" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="50s" timeout="50s" primitive HttpdVIP ocf:heartbeat:IPaddr3 \ params ip="10.10.30.103" eth_num="eth0:1" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" primitive Mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/lib/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/mysql.sock" test_passwd="slavepass" test_table="test.conn" test_user="repl" replication_user="repl" replication_passwd="slavepass" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10s" role="Master" timeout="8s" \ op monitor interval="12s" timeout="8s" primitive MysqlVIP ocf:heartbeat:IPaddr3 \ params ip="10.10.30.105" eth_num="eth0:3" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="60s" \ op monitor interval="30s" \ meta target-role="Started" primitive Tomcat1 ocf:msf:tomcat1 \ params tomcat_name="tomcat" statusurl="http://localhost:8080/"; java_home="/" catalina_home="/home/msf/runtime/tomcat/apache-tomcat-6.0.18" client="curl" testregex="*" \ op start interval="0" timeout="120s" \ op monitor interval="50s" timeout="50s" \ op stop interval="0" timeout="120s" \ meta target-role="Started" primitive Tomcat1VIP ocf:heartbeat:IPaddr3 \ params ip="10.10.30.104" eth_num="eth0:2" vip_cleanup_file="/var/run/bigha.pid" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="30s" \ meta target-role="Started" ms MS_FSR FSR \ meta notify="true" target-role="Started" ms MS_Mysql Mysql \ meta notify="true" target-role="Started" colocation FSR-with-Tomcat inf: Tomcat1 MS_FSR:Master colocation Httpd-with-ip inf: HttpdVIP Httpd colocation Mysql-with-ip inf: MysqlVIP MS_Mysql:Master colocation Tomcat1-with-ip inf: Tomcat1VIP Tomcat1 order FSR-after-tomcat inf: Tomcat1 MS_FSR order Httpd-after-ip inf: HttpdVIP Httpd order Httpd-after-tomcat inf: Tomcat1 HttpdVIP order Mysql-after-ip inf: MysqlVIP MS_Mysql order Tomcat1-after-MYSQL inf: MS_Mysql Tomcat1VIP order Tomcat1-after-ip inf: Tomcat1VIP Tomcat1 property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" Regards Rakesh ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources
There is nothing in this config that requires tomcat2 to be stopped. Perhaps: colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP was intended to be: colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1 The only other service active is httpd, which also has no constraints indicating it should stop when mysql is down. On Tue, Apr 19, 2011 at 12:04 PM, rakesh k wrote: > Hi Andrew > > As mentioned please find the output of cibadmin -Ql in this post. > > dc-uuid="87b8b88e-3ded-4e34-8708-46f7afe62935" admin_epoch="0" epoch="1105" > num_updates="18"> > > > > value="1.0.9-89bd754939df5150de7cd76835f98fe90851b677"/> > name="cluster-infrastructure" value="Heartbeat"/> > name="stonith-enabled" value="false"/> > name="no-quorum-policy" value="ignore"/> > name="last-lrm-refresh" value="1300787402"/> > > > > uname="mysql3"/> > uname="cisco-demomsf"/> > > > type="IPaddr3"> > > value="172.21.52.149"/> > value="eth0:4"/> > name="vip_cleanup_file" value="/var/run/bigha.pid"/> > > > timeout="120s"/> > > > > name="target-role" value="Started"/> > > > > > name="configfile" value="/etc/httpd/conf/httpd.conf"/> > value="/usr/sbin/httpd"/> > value="curl"/> > value="http://localhost/img/test.html"/> > value="*