[Linux-HA] Where did coros...@lists.osdl.org go?
Hi! The corosync-overview man page still has this address, but that seems to have gone: lists.linux-foundation.org[140.211.169.51] said: 550 5.1.1 coros...@lists.osdl.org... User unknown Anybody knows the current address? I hope the project is not dead... Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: DO NOT start using heartbeat 2.x in crm mode, but just use Pacemaker, please! [was: managing resource httpd in heartbeat]
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:02 in Nachricht 20110519110256.gl26...@suse.de: [...] Of course. And while our esteemed SLES10 customers are still fully supported on our maintained 2.1.4-fixed version, I personally believe everyone should move swiftly to a newer code base (say, SLE HA 11 SP1). No, we are waiting for SP2 ;-) Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: SBD and SFEX on one shared (aprtitioned) disk?
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:03 in Nachricht 20110519110338.gm26...@suse.de: On 2011-05-19T11:24:23, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! From what I've read about SBD and SFEX, I could use one disk for both of them, if SBD and SFEX get a partition on the disk. Right? Reason: The minimum of a disk on out SAN is 1GB, and it's quite wasteful to have 1GB just for SBD. Doing some calculation, 1MB for SBD should be enough for about any number of cluster nodes, and 900MB should be enough for more than 1000 resources to control. Well, yes. I'm not quite sure why you'd want to use sfex though if you have sbd fencing anyway. SBD is for node fencing only. If I need to ensure exclusive assignment of shared storage resources (well you never know what the cluster stuff tries to do) to avoid data corruption (e.g. through MD-RAID), I feel the need for cluster-wise mutex-locks. Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
On Thu, May 19, 2011 at 03:46:37PM -0700, Randy Katz wrote: To clarify, I was not seeking a quick response. I just noticed the threads I searched were NEVER answered, with the problem that I reported. That being said and about standby: Why does my node come up as standby and not as online? Because you put it there. The standby setting (as a few others) can take a lifetime, and usually that defaults to forever, though you can explicitly specify an until reboot, which actually means until restart of the cluster system on that node. Is there a setting in my conf file that affects that? Or another issue, is it configuration, please advise. Thanks, Randy PS - Here are some threads were it seems they were never answered, one going back 3 years ago: http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html Then they probably have been solved off list, via IRC or support, or by the original user finally having a facepalm experience. Besides, yes, it happens that threads go unanswered, most of the time because the question was badly asked (does not work. why?), and those that could figure it out have been distracted by more important things, or decided that, at that time, trying to figure it out was too time consuming. That's life. If it happens to you, do a friendly bump, and/or try to ask a smarter version of the question ;-) Most of the time, the answer is in the logs, and the config. But please break down the issue to a minimal configuration, and post that minimal config plust logs of one incident. Don't post your 2 MB xml config, plus a 2G log, and expect people to dig through that for fun. BTW, none of the quoted threads has anything to do with your experience, afaiks. On 5/19/2011 3:16 AM, Lars Ellenberg wrote: On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote: ps - I searched a lot online and I see this issue coming up, I doubt that _this_ issue comes up that often ;-) and then after about 3-4 emails they request the resources and constraints and then there is never an answer to the thread, why?! Hey, it's not even a day since you provided the config. People have day jobs. People get _payed_ to do support on these kinds of things, so they probably first deal with requests by paying customers. If you need SLAs, you may need to check out a support contranct. Otherwise you need to be patient. From what I read, you probably just have misunderstood some concepts. Standby is not what I think you think it is ;-) Standby is NOT for deciding where resources will be placed. Standby is for manually switching a node into a mode where it WILL NOT run any resources. And it WILL NOT leave that state by itself. It is not supposed to. You switch a node into standby if you want to do maintenance on that node, do major software, system or hardware upgrades, or otherwise expect that it won't be useful to run resources there. It won't even run DRBD secondaries. It will run nothing there. If you want automatic failover, DO NOT put your nodes in standby. Because, if you do, they can not take over resources. You have to have your nodes online for any kind of failover to happen. If you want to have a preferred location for your resources, use location constraints. Does that help? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: Setting up SBD resources in SLES11
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:15 in Nachricht 20110519111526.gn26...@suse.de: On 2011-05-19T09:19:42, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I had the doubt that setting up the SBD resources is described correctly in the High Availability Guide of SLES 11. My comment (to Novell I think) was: Shouldn't here be a resource per node? Following the procedure, the resource just starts on an arbitrary node. If one primitive per node, you'll need a locational contraint to avoid multiple primitived running on the same node, right? No, one external/sbd resource per device (which usually means: per cluster) is sufficient. And you do not need to clone it. From the description: sbd uses a shared storage device as a medium to communicate fencing requests. This allows clusters without network power switches; the downside is that access to the shared storage device becomes a Single Point of Failure. So the sbd resource distributes the fencing requests. Now what if the node where sdb runs is the minority (non-quorum)? How can the rest of the cluster tell the minority to fence (in case of a networking failure). AFAIK, as long as the storage is reachable, the sbd daemons will just be happy. Maybe it's confusing that an sbd daemon runs on every node, but the sbd resource only runs on one node. Some more documentationg words might help here. Regards, Ulrich Another book uses a clone resource for SBD (which seems to make sense). No, it doesn't. ;-) What value does that provide? For all who don't have the test at hands, here's what the guide writes about SBD setup (page 194): ---snip Configuring the Fencing Resource 1 To complete the SBD setup, it is necessary to activate SBD as a STONITH/fencing mechanism in the CIB as follows: crm configure crm(live)configure# property stonith-enabled=true crm(live)configure# property stonith-timeout=30s crm(live)configure# primitive stonith_sbd stonith:external/sbd params sbd_device=/dev/SBD crm(live)configure# commit crm(live)configure# quit Yes, and that's enough. The documentation is correct on this. Regards, Lars ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
Lars, Thank you much for the answer on the standby issue. It seems that that was the tip of my real issue. So now I have both nodes coming online. And it seems ha1 starts fine with all the resources starting. With them both online if I issue the: crm mode standby ha1.iohost.com Then I see IP Takeover on ha2 but the other resources do not start, ever, it remains: Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby Online: [ ha2.iohost.com ] Resource Group: WebServices ip1(ocf::heartbeat:IPaddr2): Started ha2.iohost.com ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com fs_webfs (ocf::heartbeat:Filesystem):Stopped fs_mysql (ocf::heartbeat:Filesystem):Stopped apache2(lsb:httpd):Stopped mysql (ocf::heartbeat:mysql): Stopped Master/Slave Set: ms_drbd_mysql Slaves: [ ha2.iohost.com ] Stopped: [ drbd_mysql:0 ] Master/Slave Set: ms_drbd_webfs Slaves: [ ha2.iohost.com ] Stopped: [ drbd_webfs:0 ] In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere I am not sure why it cannot promote the other resources on ha2, I checked drbd before putting ha1 on standby and it was up to date. Here are the surrounding log entries, the only thing I changed in the config is standby=off on both nodes: May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print: Resource Group: WebServices May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_webfs (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_mysql (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: apache2 (lsb:httpd):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: mysql(ocf::heartbeat:mysql): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Stopped: [ drbd_mysql:0 ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_webfs May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha1.iohost.com ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1arp: Breaking dependency loop at ip1 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1: Breaking dependency loop at ip1arp May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_webfs:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: fs_webfs: Rolling back scores from fs_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_mysql:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: fs_mysql: Rolling back scores from apache2 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_mysql cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: apache2: Rolling back scores from mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource apache2 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource mysql cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master Regards, Randy On
Re: [Linux-HA] [Pacemaker] Announce: Hawk (HA Web Konsole) 0.4.1
On 19/05/11 00:43, Tim Serong wrote: Hi Everybody, This is to announce version 0.4.1 of Hawk, a web-based GUI for managing and monitoring Pacemaker High-Availability clusters. [...] Building an RPM for Fedora/Red Hat is still just as easy as last time: # hg clone http://hg.clusterlabs.org/pacemaker/hawk # cd hawk # hg update hawk-0.4.1 # make rpm *ahem* It /would/ still be just as easy if I had said hg update tip, or, in this specific instance, hg update 398ae27386e (the Makefile grabs the last tag from hg to use as a version number, which is one commit *after* the actual tagged commit). Regards, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, OPS Engineering, Novell Inc. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Where did coros...@lists.osdl.org go?
On 20/05/11 16:09, Ulrich Windl wrote: Hi! The corosync-overview man page still has this address, but that seems to have gone: lists.linux-foundation.org[140.211.169.51] said: 550 5.1.1coros...@lists.osdl.org... User unknown Anybody knows the current address? I hope the project is not dead... Sounds like a bug in the manpage. That should be: open...@lists.osdl.org (See http://corosync.org/doku.php?id=support) Regards, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, OPS Engineering, Novell Inc. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
On Thu, May 19, 2011 at 11:53:24PM -0700, Randy Katz wrote: Lars, Thank you much for the answer on the standby issue. It seems that that was the tip of my real issue. So now I have both nodes coming online. And it seems ha1 starts fine with all the resources starting. With them both online if I issue the: crm mode standby ha1.iohost.com Why. Learn about crm resource move. (and unmove, for that matter). Then I see IP Takeover on ha2 but the other resources do not start, ever, it remains: Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby Online: [ ha2.iohost.com ] Resource Group: WebServices ip1(ocf::heartbeat:IPaddr2): Started ha2.iohost.com ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com fs_webfs (ocf::heartbeat:Filesystem):Stopped fs_mysql (ocf::heartbeat:Filesystem):Stopped apache2(lsb:httpd):Stopped mysql (ocf::heartbeat:mysql): Stopped Master/Slave Set: ms_drbd_mysql Slaves: [ ha2.iohost.com ] Stopped: [ drbd_mysql:0 ] Master/Slave Set: ms_drbd_webfs Slaves: [ ha2.iohost.com ] Stopped: [ drbd_webfs:0 ] In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere I am not sure why it cannot promote the other resources on ha2, I checked drbd before putting ha1 on standby and it was up to date. Double check the status of drbd: # cat /proc/drbd Check what the cluster would do, and why: # ptest -LVVV -s [add more Vs to see more detail, but brace yourself for maximum confusion ;-)] Check for constraints that get in the way: # crm configure show | grep -Ee 'location|order' check the master scores in the cib: # cibadmin -Ql -o status | grep master Look at the actions that have been performed on the resource, on both nodes: vv-- the ID of your primitive # grep lrmd:.*drbd_mysql /var/log/ha.log or wherever that ends up on your box Here are the surrounding log entries, the only thing I changed in the config is standby=off on both nodes: May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print: Resource Group: WebServices May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_webfs (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_mysql (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: apache2 (lsb:httpd):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: mysql(ocf::heartbeat:mysql): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Stopped: [ drbd_mysql:0 ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_webfs May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha1.iohost.com ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1arp: Breaking dependency loop at ip1 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1: Breaking dependency loop at ip1arp You got a dependency loop? Maybe you should fix that? You put some things in a group in a specific order, then you specify the reverse order in an explicit order and colocation constraint. That is not particularly useful. Either use a group, or use explicit order/colocation constraints, don't try to use both for the same resources. But that's nothing that would affect DRBD at this point. And as long as your DRBD is not (or can not?) be promoted, nothing that depends on it will run, obviously. May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_webfs:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: fs_webfs: Rolling back scores from fs_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_mysql:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql:
[Linux-HA] SLES11 SP1: bug in crm shell completion
Hi! The crm shell of SLES11 SP1 has the following auto-completion bug: After defining new primitives in crm configure, the new primitives don't show in completion after commit until the configure context is re-entered (e.g. by up, configure). While taking about completion: If I enter del foo, and move the cursor back behind the 'l' of del, crm doesn't complete the command (to delete) as long as theres another argument right of the cursor. In Bash similar completion works. Can it be implemented in crm shell as well? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES11 SP1: bug in crm shell completion
On Fri, May 20, 2011 at 01:32:11PM +0200, Ulrich Windl wrote: Hi! The crm shell of SLES11 SP1 has the following auto-completion bug: After defining new primitives in crm configure, the new primitives don't show in completion after commit until the configure context is re-entered (e.g. by up, configure). While taking about completion: If I enter del foo, and move the cursor back behind the 'l' of del, crm doesn't complete the command (to delete) as long as theres another argument right of the cursor. In Bash similar completion works. Can it be implemented in crm shell as well? Sure. Patches accepted ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
Hi Lars, Thank you for the tools to look at things, however, on a whim before getting into them as DRBD was looking fine in that scenario I decided to just run through the install on a different pair of VMs and making sure I used the gitco.de repository when it came to drbd83 and the clusterlabs repo for pacemaker (heartbeat and everything comes with it once the libesmtp requirement is settled, in this case by using a later epel install: rpm -ivH epel-release-5-4.noarch.rpm): So using the exact same configuration in crm except standby is off on both VMs of course, when I do the same crm node standby on one the other takes over and then back again, no problem. I am going to go back and either reinstall the other and/or compare each and every rpm and source to see which is broken or just store my install procedure. Now off to learn what you mentioned about crm resource move, thanks again. Regards, Randy On 5/20/2011 1:03 AM, Lars Ellenberg wrote: On Thu, May 19, 2011 at 11:53:24PM -0700, Randy Katz wrote: Lars, Thank you much for the answer on the standby issue. It seems that that was the tip of my real issue. So now I have both nodes coming online. And it seems ha1 starts fine with all the resources starting. With them both online if I issue the: crm mode standby ha1.iohost.com Why. Learn about crm resource move. (and unmove, for that matter). Then I see IP Takeover on ha2 but the other resources do not start, ever, it remains: Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby Online: [ ha2.iohost.com ] Resource Group: WebServices ip1(ocf::heartbeat:IPaddr2): Started ha2.iohost.com ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com fs_webfs (ocf::heartbeat:Filesystem):Stopped fs_mysql (ocf::heartbeat:Filesystem):Stopped apache2(lsb:httpd):Stopped mysql (ocf::heartbeat:mysql): Stopped Master/Slave Set: ms_drbd_mysql Slaves: [ ha2.iohost.com ] Stopped: [ drbd_mysql:0 ] Master/Slave Set: ms_drbd_webfs Slaves: [ ha2.iohost.com ] Stopped: [ drbd_webfs:0 ] In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere I am not sure why it cannot promote the other resources on ha2, I checked drbd before putting ha1 on standby and it was up to date. Double check the status of drbd: # cat /proc/drbd Check what the cluster would do, and why: # ptest -LVVV -s [add more Vs to see more detail, but brace yourself for maximum confusion ;-)] Check for constraints that get in the way: # crm configure show | grep -Ee 'location|order' check the master scores in the cib: # cibadmin -Ql -o status | grep master Look at the actions that have been performed on the resource, on both nodes: vv-- the ID of your primitive # grep lrmd:.*drbd_mysql /var/log/ha.log or wherever that ends up on your box Here are the surrounding log entries, the only thing I changed in the config is standby=off on both nodes: May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print: Resource Group: WebServices May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_webfs (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_mysql (ocf::heartbeat:Filesystem):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: apache2 (lsb:httpd):Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: mysql(ocf::heartbeat:mysql): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Stopped: [ drbd_mysql:0 ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_webfs May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha1.iohost.com ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1arp: Breaking dependency loop at ip1 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1: Breaking dependency loop at ip1arp You got a dependency loop? Maybe you should fix that? You put some things in a group in a specific order, then you specify the reverse order in an explicit order and colocation