Hi, On Thu, Jul 01, 2010 at 10:37:10AM +0200, martin.br...@icw.de wrote: > Hi Bart, > > my guess is that you did forget the start-delay attribute for the monitor > operations, that's why you see the time-out error message. > > Here is an example: > > > op monitor interval="20" role="Slave" timeout="20" > start-delay="1m" \ > op monitor interval="10" role="Master" timeout="20" > start-delay="1m" \
Why is start-delay needed? If it is then there is most probably something wrong with the resource agent. According to the crm_mon output, the start actions timed out: > > Failed actions: > > drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed > Out): > > unknown exec error > > drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed > Out): Probably you need to increase the timeout for start then. Thanks, Dejan > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" \ > params drbd_resource="r0" drbdconf="/usr/local/etc/drbd.conf" > > HTH, > Martin > > > > "Bart Willems" <b...@atipa.com> wrote on 30.06.2010 21:57:35: > > > [image removed] > > > > [Pacemaker] pacemaker fails to start drbd using ocf:linbit:drbd > > > > Bart Willems > > > > to: > > > > pacemaker > > > > 30.06.2010 21:56 > > > > [image removed] > > > > From: > > > > "Bart Willems" <b...@atipa.com> > > > > To: > > > > <pacemaker@oss.clusterlabs.org> > > > > Please respond to b...@atipa.com, The Pacemaker cluster resource > > manager <pacemaker@oss.clusterlabs.org> > > > > Hi All, > > > > I am setting SLES11 SP1 HA on 2 nodes and have configures a master/slave > > drbd resource. I can start drbd, promote/demote hosts. mount/use the > file > > system from the command line, but pacemaker fails to properly start up > the > > drdb service. The 2 nodes are named storm (master) and storm-b (slave). > > > > Details of my setup are: > > > > ********** > > * storm: * > > ********** > > > > eth0: 172.16.0.1/16 (static) > > eth1: 172.20.168.239 (dhcp) > > ipmi: 172.16.1.1/16 (static) > > > > ************ > > * storm-b: * > > ************ > > > > eth0: 172.16.0.2/16 (static) > > eth1: 172.20.168.114 (dhcp) > > ipmi: 172.16.1.2/16 (static) > > > > *********************** > > * drbd configuration: * > > *********************** > > > > storm:~ # cat /etc/drbd.conf > > # > > # please have a a look at the example configuration file in > > # /usr/share/doc/packages/drbd-utils/drbd.conf > > # > > # Note that you can use the YaST2 drbd module to configure this > > # service! > > # > > include "drbd.d/global_common.conf"; > > include "drbd.d/*.res"; > > > > storm:~ # cat /etc/drbd.d/r0.res > > resource r0 { > > device /dev/drbd_r0 minor 0; > > meta-disk internal; > > on storm { > > disk /dev/sdc1; > > address 172.16.0.1:7811; > > } > > on storm-b { > > disk /dev/sde1; > > address 172.16.0.2:7811; > > } > > syncer { > > rate 120M; > > } > > } > > > > *********************************** > > * Output of "crm configure show": * > > *********************************** > > > > storm:~ # crm configure show > > node storm > > node storm-b > > primitive backupExec-ip ocf:heartbeat:IPaddr \ > > params ip="172.16.0.10" cidr_netmask="16" nic="eth0" \ > > op monitor interval="30s" > > primitive drbd-storage ocf:linbit:drbd \ > > params drbd_resource="r0" \ > > op monitor interval="60" role="Master" timeout="60" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="100" \ > > op monitor interval="61" role="Slave" timeout="60" > > primitive drbd-storage-fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/disk1" fstype="ext3" > > primitive public-ip ocf:heartbeat:IPaddr \ > > meta target-role="started" \ > > operations $id="public-ip-operations" \ > > op monitor interval="30s" \ > > params ip="143.219.41.20" cidr_netmask="24" nic="eth1" > > primitive storm-fencing stonith:external/ipmi \ > > meta target-role="started" \ > > operations $id="storm-fencing-operations" \ > > op monitor interval="60" timeout="20" \ > > op start interval="0" timeout="20" \ > > params hostname="storm" ipaddr="172.16.1.1" userid="****" > > passwd="****" interface="lan" > > ms drbd-storage-masterslave drbd-storage \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" globally-unique="false" > > target-role="started" > > location drbd-storage-master-location drbd-storage-masterslave +inf: > storm > > location storm-fencing-location storm-fencing +inf: storm-b > > colocation drbd-storage-fs-together inf: drbd-storage-fs > > drbd-storage-masterslave:Master > > order drbd-storage-fs-startup-order inf: > drbd-storage-masterslave:promote > > drbd-storage-fs:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > no-quorum-policy="ignore" \ > > last-lrm-refresh="1277922623" \ > > node-health-strategy="only-green" \ > > stonith-enabled="true" \ > > stonith-action="poweroff" > > op_defaults $id="op_defaults-options" \ > > record-pending="false" > > > > ************************************ > > * Output of "crm_mon -o" on storm: * > > ************************************ > > > > storm:~ # crm_mon -o > > Attempting connection to the cluster... > > ============ > > Last updated: Wed Jun 30 15:25:15 2010 > > Stack: openais > > Current DC: storm - partition with quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 5 Resources configured. > > ============ > > > > Online: [ storm storm-b ] > > > > storm-fencing (stonith:external/ipmi): Started storm-b > > backupExec-ip (ocf::heartbeat:IPaddr): Started storm > > public-ip (ocf::heartbeat:IPaddr): Started storm > > > > Operations: > > * Node storm: > > public-ip: migration-threshold=1000000 > > + (8) start: rc=0 (ok) > > + (11) monitor: interval=30000ms rc=0 (ok) > > backupExec-ip: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (10) monitor: interval=30000ms rc=0 (ok) > > drbd-storage:0: migration-threshold=1000000 fail-count=1000000 > > + (9) start: rc=-2 (unknown exec error) > > + (14) stop: rc=0 (ok) > > * Node storm-b: > > storm-fencing: migration-threshold=1000000 + (7) start: rc=0 (ok) > + > > (9) monitor: interval=6) > > > > ************************************** > > * Output of "crm_mon -o" on storm-b: * > > ************************************** > > > > storm-b:~ # crm_mon -o > > Attempting connection to the cluster... > > ============ > > Last updated: Wed Jun 30 15:25:25 2010 > > Stack: openais > > Current DC: storm - partition with quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 5 Resources configured. > > ============ > > > > Online: [ storm storm-b ] > > > > storm-fencing (stonith:external/ipmi): Started storm-b > > backupExec-ip (ocf::heartbeat:IPaddr): Started storm > > public-ip (ocf::heartbeat:IPaddr): Started storm > > > > Operations: > > * Node storm: > > public-ip: migration-threshold=1000000 > > + (8) start: rc=0 (ok) > > + (11) monitor: interval=30000ms rc=0 (ok) > > backupExec-ip: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (10) monitor: interval=30000ms rc=0 (ok) > > drbd-storage:0: migration-threshold=1000000 fail-count=1000000 > > + (9) start: rc=-2 (unknown exec error) > > + (14) stop: rc=0 (ok) > > * Node storm-b: > > storm-fencing: migration-threshold=1000000 > > + (7) start: rc=0 (ok) > > + (9) monitor: interval=60000ms rc=0 (ok) > > drbd-storage:1: migration-threshold=1000000 fail-count=1000000 > > + (8) start: rc=-2 (unknown exec error) > > + (12) stop: rc=0 (ok) > > > > Failed actions: > > drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed > Out): > > unknown exec error > > drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed > Out): > > unknown exec error > > > > > > ******************************************************** > > * Output of "rcdrbd status" on both storm and storm-b: * > > ******************************************************** > > > > # rcdrbd status > > drbd driver loaded OK; device status: > > version: 8.3.7 (api:88/proto:86-91) > > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by > p...@fat-tyre, > > 2010-01-13 17:17:27 > > m:res cs ro ds p mounted > > fstype > > 0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r---- > > > > ********************************* > > * Part of the drbd log entries: * > > ********************************* > > > > Jun 30 15:38:10 storm kernel: [ 3730.185457] drbd: initialized. Version: > > 8.3.7 (api:88/proto:86-91) > > Jun 30 15:38:10 storm kernel: [ 3730.185459] drbd: GIT-hash: > > ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by p...@fat-tyre, > 2010-01-13 > > 17:17:27 > > Jun 30 15:38:10 storm kernel: [ 3730.185460] drbd: registered as block > > device major 147 > > Jun 30 15:38:10 storm kernel: [ 3730.185462] drbd: minor_table @ > > 0xffff88035fc0ca80 > > Jun 30 15:38:10 storm kernel: [ 3730.188253] block drbd0: Starting > worker > > thread (from cqueue [9510]) > > Jun 30 15:38:10 storm kernel: [ 3730.188312] block drbd0: disk( Diskless > -> > > Attaching ) > > Jun 30 15:38:10 storm kernel: [ 3730.188866] block drbd0: Found 4 > > transactions (4 active extents) in activity log. > > Jun 30 15:38:10 storm kernel: [ 3730.188868] block drbd0: Method to > ensure > > write ordering: barrier > > Jun 30 15:38:10 storm kernel: [ 3730.188870] block drbd0: > max_segment_size ( > > = BIO size ) = 32768 > > Jun 30 15:38:10 storm kernel: [ 3730.188872] block drbd0: drbd_bm_resize > > called with capacity == 9765216 > > Jun 30 15:38:10 storm kernel: [ 3730.188907] block drbd0: resync bitmap: > > bits=1220652 words=19073 > > Jun 30 15:38:10 storm kernel: [ 3730.188910] block drbd0: size = 4768 MB > > (4882608 KB) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stdout) > > Jun 30 15:38:10 storm kernel: [ 3730.189263] block drbd0: recounting of > set > > bits took additional 0 jiffies > > Jun 30 15:38:10 storm kernel: [ 3730.189265] block drbd0: 4 KB (1 bits) > > marked out-of-sync by on disk bit-map. > > Jun 30 15:38:10 storm kernel: [ 3730.189269] block drbd0: disk( > Attaching -> > > UpToDate ) > > Jun 30 15:38:10 storm kernel: [ 3730.191735] block drbd0: conn( > StandAlone > > -> Unconnected ) > > Jun 30 15:38:10 storm kernel: [ 3730.191748] block drbd0: Starting > receiver > > thread (from drbd0_worker [15487]) > > Jun 30 15:38:10 storm kernel: [ 3730.191780] block drbd0: receiver > > (re)started > > Jun 30 15:38:10 storm kernel: [ 3730.191785] block drbd0: conn( > Unconnected > > -> WFConnection ) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) 0: Failure: (124) Device is attached to a > disk > > (use detach first) > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) Command 'drbdsetup 0 disk /dev/sdc1 > /dev/sdc1 > > internal > > Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: > > (drbd-storage:0:start:stderr) --set-defaults --create-device' terminated > > with exit code 10 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Called drbdadm -c > > /etc/drbd.conf --peer storm-b up r0 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Exit code 1 > > Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Command output: > > > > I made sure rcdrbd was stopped before starting rcopenais, so the failure > > related to the device being attached arrises during openais startup. > > > > ************************* > > * Result of ocf-tester: * > > ************************* > > > > storm:~ # ocf-tester -n drbd-storage -o drbd_resource="r0" > > /usr/lib/ocf/resource.d/linbit/drbd > > Beginning tests for /usr/lib/ocf/resource.d/linbit/drbd... > > * rc=6: Validation failed. Did you supply enough options with -o ? > > Aborting tests > > > > The only required parameter according to "crm ra info ocf:linbit:drbd" > is > > drbd_resource, so there shouldn't be any additional options required to > make > > ocf-tester work. > > > > > > Any suggestions for debugging and solutions would be most appreciated. > > > > Thanks, > > Bart > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi? > > product=Pacemaker > > > InterComponentWare AG: > Vorstand: Peter Kirschbauer (Vors.), J?rg Stadler / Aufsichtsratsvors.: Prof. > Dr. Christof Hettich > Firmensitz: 69190 Walldorf, Altrottstra?e 31 / AG Mannheim HRB 351761 / > USt.-IdNr.: DE 198388516 > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker