Hi James, On Mon, 30 Sep 2013 12:31:52 -0700 James Oakley <jf...@funktronics.ca> wrote:
> I am having some trouble with DRBD Master/Slave resources in a 3-node > cluster. > > I am using the Pacemaker packages from ha-clustering:Stable on > openSUSE 12.3. I was going to try the packages from Unstable to see > if they work better, but it seems the openais package is missing > there. I have a quite similar setup, currently running on stock 12.2. I have a test system just updated to 12.3, with the ha-clustering:Stable, and it fails with STONITH enabled almost instantly, due to certain segfaults in the stonith resources. With 12.2 it works flawless, and with 12.3 and the Stable repo, but without STONITH, also. > So I have 3 nodes, called arthur, jonas, and rusty. The jonas and > rusty nodes have 4 DRBD master/slave resources, which are used to > back a series of filesystems, while the arthur node is included > mainly to avoid split-brain, but I intend to run some resources on it > as well, and possibly add some more nodes. : > Is there anything obvious I am missing? I don't know, but my configuration is - as said - almost similar, but a _lot_ shorter, due to usage of groups and thus far less contraints and location definitions. My nodes are virtual machines in VMware, thus the vcenter stonith resources. The nodes, hermes1 and hermes 2 have the drbd resources, hermes1 being the preferred node, and hermes3 is there for quorum (and logs): ===== node hermes1 node hermes2 node hermes3 primitive apache2 lsb:apache2 \ meta failure-timeout="90" \ operations $id="apache2-operations" \ op monitor interval="15" timeout="15" primitive drbdr0 ocf:linbit:drbd \ params drbd_resource="r0" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="30" primitive drbdr1 ocf:linbit:drbd \ params drbd_resource="r1" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="30" \ meta target-role="Started" primitive firewall_rules lsb:firewall_rules \ meta failure-timeout="90" \ operations $id="firewall_rules-operations" \ op monitor interval="60" timeout="60" primitive fs_0 ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0" directory="/conf" fstype="ext4" options="defaults" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="60" timeout="40" depth="0" \ meta target-role="Started" primitive fs_1 ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r1" directory="/var/spool/postfix" fstype="ext4" options="defaults" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="60" timeout="40" depth="0" \ meta target-role="Started" primitive getrecipientaccess lsb:getrecipientaccess \ meta failure-timeout="90" \ operations $id="getrecipientaccess-operations" \ op monitor interval="15" timeout="15" primitive mailgraph lsb:mailgraph \ meta failure-timeout="90" \ operations $id="mailgraph-operations" \ op monitor interval="15" timeout="15" primitive policyd-weight lsb:policyd-weight \ meta failure-timeout="90" \ operations $id="policyd-weight-operations" \ op monitor interval="15" timeout="15" primitive postfix lsb:postfix \ meta failure-timeout="90" \ operations $id="postfix-operations" \ op monitor interval="15" timeout="15" primitive postgrey lsb:postgrey \ meta failure-timeout="90" \ operations $id="postgrey-operations" \ op monitor interval="15" timeout="15" primitive queuegraph lsb:queuegraph \ meta failure-timeout="90" \ operations $id="queuegraph-operations" \ op monitor interval="15" timeout="15" primitive saslauthd lsb:saslauthd \ meta failure-timeout="90" \ operations $id="saslauthd-operations" \ op monitor interval="15" timeout="15" primitive spammailgraph lsb:spammailgraph \ meta failure-timeout="90" \ operations $id="spammailgraph-operations" \ op monitor interval="15" timeout="15" primitive updateispwhitelist lsb:updateispwhitelist \ meta failure-timeout="90" \ operations $id="updateispwhitelist-operations" \ op monitor interval="15" timeout="15" primitive vfencing stonith:external/vcenter \ params VI_SERVER="svirtctr.it.ctr.internal" \ VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" \ HOSTLIST="hermes1=SHERMES1;hermes2=SHERMES2;shermes3=SHERMES3" \ RESETPOWERON="0" \ op monitor start-delay="15s" interval="3600s" primitive vip_1 ocf:heartbeat:IPaddr2 \ params ip="10.183.75.23" nic="eth0" iflabel="0" cidr_netmask="26" \ op monitor interval="10" timeout="20" group apps vip_1 firewall_rules postgrey policyd-weight saslauthd postfix apache2 mailgraph queuegraph spammailgraph getrecipientaccess updateispwhitelist \ meta target-role="Started" group fs fs_0 fs_1 group g-drbd drbdr0 drbdr1 ms ms_drbd g-drbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" clone Fencing vfencing location l-Fencing_hermes1 Fencing 0: hermes1 location l-Fencing_hermes2 Fencing 0: hermes2 location l-Fencing_hermes3 Fencing 0: hermes3 location l-apache2-hermes3 apache2 -inf: hermes3 location l-apps-hermes1 apps 50: hermes1 location l-apps-hermes2 apps 0: hermes2 location l-fs-hermes1 fs 50: hermes1 location l-fs-hermes2 fs 0: hermes2 location l-mailgraph-hermes3 mailgraph -inf: hermes3 location l-ms_drbd_hermes1 ms_drbd 50: hermes1 location l-ms_drbd_hermes2 ms_drbd 0: hermes2 location l-postfix-hermes3 postfix -inf: hermes3 location l-queuegraph-hermes3 queuegraph -inf: hermes3 location l-spammailgraph-hermes3 spammailgraph -inf: hermes3 colocation cl-apps_on_fs inf: fs:Started apps:Started colocation cl-fs_on_drbd_r0 inf: ms_drbd:Master fs:Started order o-apps_after_fs inf: fs:start apps:start order o-fs_after_drbd inf: ms_drbd:promote fs:start property $id="cib-bootstrap-options" \ dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \ cluster-infrastructure="openais" \ expected-quorum-votes="3" \ symmetric-cluster="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1380199456" \ stonith-action="poweroff" ===== Note, that there are only two colocation and two order statements, and I believe, that I could get rid of some of the location statements, too. As said, this setup currently runs on openSUSE 12.2 I know, 13.1 is near, but I fear the status of the ha-clustering in 13.1 will not be that great, so maybe you give it a try with a 12.2 installation first. Greetings, Stefan -- Stefan Botter zu Hause Bremen _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org