Hi Martin, I don't have drbd set to startup automatically in any runlevel and I did a "rcdrbd stop" on both nodes before starting openais. I just repeated it one more time, checking lsmod first to confirm drbd is not loaded and the result is the same. One piece of extra information is that even though drbd fails to start up correctly, there is at least partial success:
storm:~ # rcdrbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by p...@fat-tyre, 2010-01-13 17:17:27 m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Secondary UpToDate/UpToDate C This status is the same on both nodes. It looks like all that 's missing is to promote the correct node. Starting up drbd manually takes about 15s. Starting openais on storm only, only started my stonith-fencing resource, none of the others. I 'm going to simplify my setup and get rid of everything but the core resources. Is pacemaker 1.1.2 (the version included in SLES 11 SP1 HA) actually stable? The highest pre-built binary version available from clusterlabs.org seems to be 1.0.9. Thanks, Bart -----Original Message----- From: martin.br...@icw.de [mailto:martin.br...@icw.de] Sent: Thursday, July 01, 2010 11:03 To: b...@atipa.com; The Pacemaker cluster resource manager Subject: Re: [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd Hi Bart, . Just some more thoughts: Are you sure that drbd was really stopped? Does this error also happen after a clean restart (without drbd starting at runlevel), i.e. "lsmod | grep drbd" without results? How long does it take if you setup drbd (attach,syncer,connect,primary) manually? What happens when you start openais on only one node? The syncer rate seems a bit high to me ( http://www.drbd.org/users-guide/s-configure-syncer-rate.html#eq-syncer-rate- example1 ), but that should not be the problem. HTH, Martin "Bart Willems" <b...@atipa.com> wrote on 01.07.2010 16:42:26: > [image removed] > > Re: [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd > > Bart Willems > > to: > > 'The Pacemaker cluster resource manager' > > 01.07.2010 16:46 > > Please respond to bart, The Pacemaker cluster resource manager > > Hi Martin, > > No luck I 'm afraid. I first added a start-delay to the monitor operations, > and when that didn't work I also added a start-delay to the start operation: > > primitive drbd-storage ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="10" role="Master" timeout="60" start-delay="1m" > \ > op start interval="0" timeout="240s" start-delay="1m" \ > op stop interval="0" timeout="100s" \ > op monitor interval="20" role="Slave" timeout="60" start-delay="1m" > > Thanks, > Bart > > -----Original Message----- > From: martin.br...@icw.de [mailto:martin.br...@icw.de] > Sent: Thursday, July 01, 2010 3:37 > To: b...@atipa.com; The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd > > Hi Bart, > > my guess is that you did forget the start-delay attribute for the monitor > operations, that's why you see the time-out error message. > > Here is an example: > > > op monitor interval="20" role="Slave" timeout="20" > start-delay="1m" \ > op monitor interval="10" role="Master" timeout="20" > start-delay="1m" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" \ > params drbd_resource="r0" drbdconf="/usr/local/etc/drbd.conf" > > HTH, > Martin > > > > "Bart Willems" <b...@atipa.com> wrote on 30.06.2010 21:57:35: > > > [image removed] > > > > [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd > > > > Bart Willems > > > > to: > > > > pacemaker > > > > 30.06.2010 21:56 > > > > [image removed] > > > > From: > > > > "Bart Willems" <b...@atipa.com> > > > > To: > > > > <pacemaker@oss.clusterlabs.org> > > > > Please respond to b...@atipa.com, The Pacemaker cluster resource > > manager <pacemaker@oss.clusterlabs.org> > > > > Hi All, > > > > I am setting SLES11 SP1 HA on 2 nodes and have configures a master/slave > > drbd resource. I can start drbd, promote/demote hosts. mount/use the > file > > system from the command line, but pacemaker fails to properly start up > the > > drdb service. The 2 nodes are named storm (master) and storm-b (slave). > > > > Details of my setup are: > > > > ********** > > * storm: * > > ********** > > > > eth0: 172.16.0.1/16 (static) > > eth1: 172.20.168.239 (dhcp) > > ipmi: 172.16.1.1/16 (static) > > > > ************ > > * storm-b: * > > ************ > > > > eth0: 172.16.0.2/16 (static) > > eth1: 172.20.168.114 (dhcp) > > ipmi: 172.16.1.2/16 (static) > > > > *********************** > > * drbd configuration: * > > *********************** > > > > storm:~ # cat /etc/drbd.conf > > # > > # please have a a look at the example configuration file in > > # /usr/share/doc/packages/drbd-utils/drbd.conf > > # > > # Note that you can use the YaST2 drbd module to configure this > > # service! > > # > > include "drbd.d/global_common.conf"; > > include "drbd.d/*.res"; > > > > storm:~ # cat /etc/drbd.d/r0.res > > resource r0 { > > device /dev/drbd_r0 minor 0; > > meta-disk internal; > > on storm { > > disk /dev/sdc1; > > address 172.16.0.1:7811; > > } > > on storm-b { > > disk /dev/sde1; > > address 172.16.0.2:7811; > > } > > syncer { > > rate 120M; > > } > > } > > > > *********************************** > > * Output of "crm configure show": * > > *********************************** > > > > storm:~ # crm configure show > > node storm > > node storm-b > > primitive backupExec-ip ocf:heartbeat:IPaddr \ > > params ip="172.16.0.10" cidr_netmask="16" nic="eth0" \ > > op monitor interval="30s" > > primitive drbd-storage ocf:linbit:drbd \ > > params drbd_resource="r0" \ > > op monitor interval="60" role="Master" timeout="60" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="100" \ > > op monitor interval="61" role="Slave" timeout="60" > > primitive drbd-storage-fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/disk1" fstype="ext3" > > primitive public-ip ocf:heartbeat:IPaddr \ > > meta target-role="started" \ > > operations $id="public-ip-operations" \ > > op monitor interval="30s" \ > > params ip="143.219.41.20" cidr_netmask="24" nic="eth1" > > primitive storm-fencing stonith:external/ipmi \ > > meta target-role="started" \ > > operations $id="storm-fencing-operations" \ > > op monitor interval="60" timeout="20" \ > > op start interval="0" timeout="20" \ > > params hostname="storm" ipaddr="172.16.1.1" userid="****" > > passwd="****" interface="lan" > > ms drbd-storage-masterslave drbd-storage \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" globally-unique="false" > > target-role="started" > > location drbd-storage-master-location drbd-storage-masterslave +inf: > storm > > location storm-fencing-location storm-fencing +inf: storm-b > > colocation drbd-storage-fs-together inf: drbd-storage-fs > > drbd-storage-masterslave:Master > > order drbd-storage-fs-startup-order inf: > drbd-storage-masterslave:promote > > drbd-storage-fs:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > no-quorum-policy="ignore" \ > > last-lrm-refresh="1277922623" \ > > node-health-strategy="only-green" \ > > stonith-enabled="true" \ > > stonith-action="poweroff" > > op_defaults $id="op_defaults-options" \ > > record-pending="false" > > > > ************************************ > > * Output of "crm_mon -o" on storm: * > > ************************************ > > > > storm:~ # crm_mon -o > > Attempting connection to the cluster... > > ============ > > Last updated: Wed Jun 30 15:25:15 2010 > > Stack: openais > > Current DC: storm - partition with quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 5 Resources configured. > > ============ > > > > Online: [ storm storm-b ] > > > > storm-fencing (stonith:external/ipmi): Started storm-b > > backupExec-ip (ocf::heartbeat:IPaddr): Started storm > > public-ip (ocf::heartbeat:IPaddr): Started storm > > > > Operations: > > * Node storm: > > public-ip: migration-threshold=1000000 > > + (8) start: rc=0 (ok) > > + (11) monitor: interval=30000ms rc=0 (ok) > > backupExec-ip: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (10) monitor: interval=30000ms rc=0 (ok) > > drbd-storage:0: migration-threshold=1000000 fail-count=1000000 > > + (9) start: rc=-2 (unknown exec error) > > + (14) stop: rc=0 (ok) > > * Node storm-b: > > storm-fencing: migration-threshold=1000000 + (7) start: rc=0 (ok) > + > > (9) monitor: interval=6) > > > > ************************************** > > * Output of "crm_mon -o" on storm-b: * > > ************************************** > > > > storm-b:~ # crm_mon -o > > Attempting connection to the cluster... > > ============ > > Last updated: Wed Jun 30 15:25:25 2010 > > Stack: openais > > Current DC: storm - partition with quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 5 Resources configured. > > ============ > > > > Online: [ storm storm-b ] > > > > storm-fencing (stonith:external/ipmi): Started storm-b > > backupExec-ip (ocf::heartbeat:IPaddr): Started storm > > public-ip (ocf::heartbeat:IPaddr): Started storm > > > > Operations: > > * Node storm: > > public-ip: migration-threshold=1000000 > > + (8) start: rc=0 (ok) > > + (11) monitor: interval=30000ms rc=0 (ok) > > backupExec-ip: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (10) monitor: interval=30000ms rc=0 (ok) > > drbd-storage:0: migration-threshold=1000000 fail-count=1000000 > > + (9) start: rc=-2 (unknown exec error) > > + (14) stop: rc=0 (ok) > > * Node storm-b: > > storm-fencing: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (9) monitor: interval=60000ms rc=0 (ok) > > drbd-storage:1: migration-threshold=1000000 fail-count=1000000 > > + (8) start: rc=-2 (unknown exec error) > > + (12) stop: rc=0 (ok) > > > > Failed actions: > > drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed > Out): > > unknown exec error > > drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed > Out): > > unknown exec error > > > > > > ******************************************************** > > * Output of "rcdrbd status" on both storm and storm-b: * > > ******************************************************** > > > > # rcdrbd status > > drbd driver loaded OK; device status: > > version: 8.3.7 (api:88/proto:86-91) > > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by > p...@fat-tyre, > > 2010-01-13 17:17:27 > > m:res cs ro ds p mounted > > fstype > > 0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r---- > > > > ********************************* > > * Part of the drbd log entries: * > > ********************************* > > > > Jun 30 15:38:10 storm kernel: [ 3730.185457] drbd: initialized. Version: > > 8.3.7 (api:88/proto:86-91) > > Jun 30 15:38:10 storm kernel: [ 3730.185459] drbd: GIT-hash: > > ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by p...@fat-tyre, > 2010-01-13 > > 17:17:27 > > Jun 30 15:38:10 storm kernel: [ 3730.185460] drbd: registered as block > > device major 147 > > Jun 30 15:38:10 storm kernel: [ 3730.185462] drbd: minor_table @ > > 0xffff88035fc0ca80 > > Jun 30 15:38:10 storm kernel: [ 3730.188253] block drbd0: Starting > worker > > thread (from cqueue [9510]) > > Jun 30 15:38:10 storm kernel: [ 3730.188312] block drbd0: disk( Diskless > -> > > Attaching ) > > Jun 30 15:38:10 storm kernel: [ 3730.188866] block drbd0: Found 4 > > transactions (4 active extents) in activity log. > > Jun 30 15:38:10 storm kernel: [ 3730.188868] block drbd0: Method to > ensure > > write ordering: barrier > > Jun 30 15:38:10 storm kernel: [ 3730.188870] block drbd0: > max_segment_size ( > > = BIO size ) = 32768 > > Jun 30 15:38:10 storm kernel: [ 3730.188872] block drbd0: drbd_bm_resize > > called with capacity == 9765216 > > Jun 30 15:38:10 storm kernel: [ 3730.188907] block drbd0: resync bitmap: > > bits=1220652 words=19073 > > Jun 30 15:38:10 storm kernel: [ 3730.188910] block drbd0: size = 4768 MB > > (4882608 KB) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stdout) > > Jun 30 15:38:10 storm kernel: [ 3730.189263] block drbd0: recounting of > set > > bits took additional 0 jiffies > > Jun 30 15:38:10 storm kernel: [ 3730.189265] block drbd0: 4 KB (1 bits) > > marked out-of-sync by on disk bit-map. > > Jun 30 15:38:10 storm kernel: [ 3730.189269] block drbd0: disk( > Attaching -> > > UpToDate ) > > Jun 30 15:38:10 storm kernel: [ 3730.191735] block drbd0: conn( > StandAlone > > -> Unconnected ) > > Jun 30 15:38:10 storm kernel: [ 3730.191748] block drbd0: Starting > receiver > > thread (from drbd0_worker [15487]) > > Jun 30 15:38:10 storm kernel: [ 3730.191780] block drbd0: receiver > > (re)started > > Jun 30 15:38:10 storm kernel: [ 3730.191785] block drbd0: conn( > Unconnected > > -> WFConnection ) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) 0: Failure: (124) Device is attached to a > disk > > (use detach first) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) Command 'drbdsetup 0 disk /dev/sdc1 > /dev/sdc1 > > internal > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) --set-defaults --create-device' terminated > > with exit code 10 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Called drbdadm -c > > /etc/drbd.conf --peer storm-b up r0 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Exit code 1 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Command output: > > > > I made sure rcdrbd was stopped before starting rcopenais, so the failure > > related to the device being attached arrises during openais startup. > > > > ************************* > > * Result of ocf-tester: * > > ************************* > > > > storm:~ # ocf-tester -n drbd-storage -o drbd_resource="r0" > > /usr/lib/ocf/resource.d/linbit/drbd > > Beginning tests for /usr/lib/ocf/resource.d/linbit/drbd... > > * rc=6: Validation failed. Did you supply enough options with -o ? > > Aborting tests > > > > The only required parameter according to "crm ra info ocf:linbit:drbd" > is > > drbd_resource, so there shouldn't be any additional options required to > make > > ocf-tester work. > > > > > > Any suggestions for debugging and solutions would be most appreciated. > > > > Thanks, > > Bart > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi? > > product=Pacemaker > > > InterComponentWare AG: > Vorstand: Peter Kirschbauer (Vors.), Jvrg Stadler / Aufsichtsratsvors.: > Prof. Dr. Christof Hettich > Firmensitz: 69190 Walldorf, Altrottstra_e 31 / AG Mannheim HRB 351761 / > USt.-IdNr.: DE 198388516 = > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi? > product=Pacemaker InterComponentWare AG: Vorstand: Peter Kirschbauer (Vors.), Jvrg Stadler / Aufsichtsratsvors.: Prof. Dr. Christof Hettich Firmensitz: 69190 Walldorf, Altrottstra_e 31 / AG Mannheim HRB 351761 / USt.-IdNr.: DE 198388516 = _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker