Hello, I'm running following version of drbd on Linux 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP
version: 8.3.8.1 (api:88/proto:86-94) GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by rmake-chroot@localhost.localdomain, 2010-09-22 23:08:46 I'm seeing lots of these messages : ========================= May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 11:51:51 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 11:51:51 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 12:06:52 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 12:06:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:06:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:06:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 12:06:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 12:21:52 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 12:21:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:21:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:21:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 12:21:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 12:34:40 foo-01 lrmd: [4060]: info: rsc:drbd0:1:63: monitor May 20 12:36:52 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 12:36:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:36:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:36:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 12:36:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 12:51:52 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 12:51:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:51:52 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:51:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 12:51:52 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 12:59:12 foo-01 kernel: block drbd1: pdsk( UpToDate -> Diskless ) May 20 12:59:12 foo-01 kernel: block drbd1: Creating new current UUID May 20 12:59:13 foo-01 kernel: block drbd1: real peer disk state = Consistent May 20 12:59:13 foo-01 kernel: block drbd1: drbd_sync_handshake: May 20 12:59:13 foo-01 kernel: block drbd1: self 7BFDD5CAEC7E3D71:D4618C068872B92F:85DD32B6D2F5BCF0:42ABBC298069B9B4 bits:0 flags:0 May 20 12:59:13 foo-01 kernel: block drbd1: peer D4618C068872B92E:0000000000000000:85DD32B6D2F5BCF0:42ABBC298069B9B4 bits:0 flags:2 May 20 12:59:13 foo-01 kernel: block drbd1: uuid_compare()=1 by rule 70 May 20 12:59:13 foo-01 kernel: block drbd1: conn( Connected -> WFBitMapS ) pdsk( Diskless -> Outdated ) May 20 12:59:13 foo-01 kernel: block drbd1: peer( Secondary -> Unknown ) conn( WFBitMapS -> TearDown ) May 20 12:59:13 foo-01 kernel: block drbd1: asender terminated May 20 12:59:13 foo-01 kernel: block drbd1: Terminating asender thread May 20 12:59:13 foo-01 kernel: block drbd1: oc_eds eund-2<>lc rd:sotsn eotiMpsz=06sn=1 May 20 12:59:13 foo-01 kernel: block drbd1: Handshake successful: Agreed network protocol version 94 May 20 12:59:13 foo-01 kernel: block drbd1: conn( WFConnection -> WFReportParams ) May 20 12:59:13 foo-01 kernel: block drbd1: Starting asender thread (from drbd1_receiver [20202]) May 20 12:59:13 foo-01 kernel: block drbd1: data-integrity-alg: <not-used> May 20 12:59:13 foo-01 kernel: block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( Outdated -> Diskless ) May 20 12:59:31 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 12:59:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:59:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 12:59:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 12:59:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:38 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:39 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:03:39 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:39 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:03:39 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:03:39 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:03:39 foo-01 kernel: block drbd1: State change failed: Need access to UpToDate data May 20 13:03:39 foo-01 kernel: block drbd1: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/Diskless r--- } May 20 13:03:39 foo-01 kernel: block drbd1: wanted = { cs:Connected ro:Primary/Secondary ds:Diskless/Diskless r--- } May 20 13:03:39 foo-01 kernel: block drbd1: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( Diskless -> DUnknown ) May 20 13:03:39 foo-01 kernel: block drbd1: sock was shut down by peer May 20 13:03:39 foo-01 kernel: block drbd1: short read expecting header on sock: r=0 May 20 13:03:39 foo-01 kernel: block drbd1: meta connection shut down by peer. May 20 13:03:39 foo-01 kernel: block drbd1: asender terminated May 20 13:03:39 foo-01 kernel: block drbd1: Terminating asender thread May 20 13:03:39 foo-01 kernel: block drbd1: Connection closed May 20 13:03:39 foo-01 kernel: block drbd1: conn( Disconnecting -> StandAlone ) May 20 13:03:39 foo-01 kernel: block drbd1: receiver terminated May 20 13:03:39 foo-01 kernel: block drbd1: Terminating receiver thread May 20 13:04:31 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:04:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:04:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:04:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:04:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:19:31 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:19:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:19:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:19:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:19:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:34:31 foo-01 pengine: [4062]: notice: clone_print: Master/Slave Set: ms-drbd [drbd0] May 20 13:34:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:34:31 foo-01 pengine: [4062]: notice: common_apply_stickiness: ms-drbd can fail 999999 more times on foo-01.trustblue.com before being forced off May 20 13:34:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:0 (Slave foo-02.trustblue.com) May 20 13:34:31 foo-01 pengine: [4062]: notice: LogActions: Leave resource drbd0:1 (Master foo-01.trustblue.com) May 20 13:34:43 foo-01 lrmd: [4060]: info: rsc:drbd0:1:63: monitor ========== I checked raid controller for errors but didn't see any. My drbd.conf is something like: # # please have a a look at the example configuration file in # /usr/share/doc/drbd82/drbd.conf # global { usage-count no; } common { protocol C; startup { wfc-timeout 120; degr-wfc-timeout 120; } } resource var_nsm { syncer { rate 333M; } # handlers { # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; # } disk { on-io-error detach; no-disk-barrier; no-disk-flushes; } net { after-sb-1pri discard-secondary; max-buffers 8000; max-epoch-size 8000; sndbuf-size 0; } on foo-01.trustblue.com { device /dev/drbd1; disk /dev/sdb3; address 172.20.20.1:7791; meta-disk internal; } on foo-02.trustblue.com { device /dev/drbd1; disk /dev/sdb3; address 172.20.20.3:7791; meta-disk internal; } } Thanks Shravan _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker