The basics: Dual-primary cman+pacemaker+drbd cluster running on RHEL6.2; spec files and versions below.
Problem: If I restart both nodes at the same time, or even just start pacemaker on both nodes at the same time, the drbd ms resource starts, but both nodes stay in slave mode. They'll both stay in slave mode until one of the following occurs: - I manually type "crm resource cleanup <ms-resource-name>" - 15 minutes elapse. Then the "PEngine Recheck Timer" is fired, and the ms resources are promoted. The key resource definitions: primitive AdminDrbd ocf:linbit:drbd \ params drbd_resource="admin" \ op monitor interval="59s" role="Master" timeout="30s" \ op monitor interval="60s" role="Slave" timeout="30s" \ op stop interval="0" timeout="100" \ op start interval="0" timeout="240" \ meta target-role="Master" ms AdminClone AdminDrbd \ meta master-max="2" master-node-max="1" clone-max="2" \ clone-node-max="1" notify="true" interleave="true" # The lengthy definition of "FilesystemGroup" is in the crm pastebin below clone FilesystemClone FilesystemGroup \ meta interleave="true" target-role="Started" colocation Filesystem_With_Admin inf: FilesystemClone AdminClone:Master order Admin_Before_Filesystem inf: AdminClone:promote FilesystemClone:start Note that I stuck in "target-role" options to try to solve the problem; no effect. When I look in /var/log/messages, I see no error messages or indications why the promotion should be delayed. The 'admin' drbd resource is reported as UpToDate on both nodes. There are no error messages when I force the issue with: crm resource cleanup AdminClone It's as if pacemaker, at start, needs some kind of "kick" after the drbd resource is ready to be promoted. This is not just an abstract case for me. At my site, it's not uncommon for there to be lengthy power outages that will bring down the cluster. Both systems will come up when power is restored, and I need for cluster services to be available shortly afterward, not 15 minutes later. Any ideas? Details: # rpm -q kernel cman pacemaker drbd kernel-2.6.32-220.4.1.el6.x86_64 cman-3.0.12.1-23.el6.x86_64 pacemaker-1.1.6-3.el6.x86_64 drbd-8.4.1-1.el6.x86_64 Output of crm_mon after two-node reboot or pacemaker restart: <http://pastebin.com/jzrpCk3i> cluster.conf: <http://pastebin.com/sJw4KBws> "crm configure show": <http://pastebin.com/MgYCQ2JH> "drbdadm dump all": <http://pastebin.com/NrY6bskk> -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems