I'm running DRBD 8.3.13 on Debian Wheezy, Linux 3.2.20 and
every now and then my DRBD resources spontaneously switch from
cs:Connected to cs:WFConnection or the various syncing states and back
(according to "watch cat /proc/drbd").
I've sometimes seen "broken pipe" or even "protocol error"(!?) flashing
by briefly.
No luck debugging this so far. I've tried changing network cards,
switching between bonding modes, reverting back to regular ethX (instead
of bonding), various MTU and txqueuelen values, using
resource-only-fencing (corosync) and not. Nothing has helped so far -
this connection unstability just seems to come and go.
Any better debugging ideas? Or maybe this is not a network issue at all?
Excerpt from DRBD configuration:
net {
timeout 20;
max-epoch-size 8192;
max-buffers 128k;
connect-int 2;
ping-int 2;
sndbuf-size 10M;
rcvbuf-size 10M;
ko-count 5;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
ping-timeout 2;
}
syncer {
rate 100M;
al-extents 3389;
csums-alg crc32c;
verify-alg crc32c;
}
Here's a syslog snippet demonstrating one whole cycle of this behavior:
kernel: [ 9827.966027] block drbd6: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
kernel: [ 9828.199039] block drbd6: helper command: /sbin/drbdadm
after-resync-target minor-6
crm-unfence-peer.sh[24132]: invoked for drbd-serv-mail
crm-unfence-peer.sh[24132]: WARNING drbd-fencing could not determine the
master id of drbd resource drbd-serv-mail
kernel: [ 9828.238394] block drbd6: helper command: /sbin/drbdadm
after-resync-target minor-6 exit code 1 (0x100)
kernel: [ 9828.298906] block drbd6: bitmap WRITE of 83 pages took 15 jiffies
kernel: [ 9828.503024] block drbd6: 0 KB (0 bits) marked out-of-sync by
on disk bit-map.
kernel: [ 9831.788745] block drbd6: magic?? on data m: 0xa0816800 c:
5120 l: 0
kernel: [ 9831.788790] block drbd6: peer( Primary -> Unknown ) conn(
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
kernel: [ 9831.789573] block drbd6: asender terminated
kernel: [ 9831.789576] block drbd6: Terminating drbd6_asender
kernel: [ 9832.041526] block drbd6: Connection closed
kernel: [ 9832.041531] block drbd6: conn( ProtocolError -> Unconnected )
kernel: [ 9832.041535] block drbd6: receiver terminated
kernel: [ 9832.041537] block drbd6: Restarting drbd6_receiver
kernel: [ 9832.041539] block drbd6: receiver (re)started
kernel: [ 9832.041542] block drbd6: conn( Unconnected -> WFConnection )
kernel: [ 9832.457266] block drbd6: Handshake successful: Agreed network
protocol version 96
kernel: [ 9832.457276] block drbd6: conn( WFConnection -> WFReportParams )
kernel: [ 9832.457357] block drbd6: Starting asender thread (from
drbd6_receiver [29943])
kernel: [ 9832.457733] block drbd6: data-integrity-alg: <not-used>
kernel: [ 9832.457745] block drbd6: drbd_sync_handshake:
kernel: [ 9832.457748] block drbd6: self
E8E3BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227
bits:0 flags:0
kernel: [ 9832.457751] block drbd6: peer
E915DF859DCA76C9:E8E3BDC352C4C581:71C7A5DE96C51227:71C6A5DE96C51227
bits:12 flags:0
kernel: [ 9832.457754] block drbd6: uuid_compare()=-1 by rule 50
kernel: [ 9832.457758] block drbd6: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk(
DUnknown -> UpToDate )
kernel: [ 9832.883300] block drbd6: conn( WFBitMapT -> WFSyncUUID )
kernel: [ 9832.987097] block drbd6: updated sync uuid
E8E4BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227
kernel: [ 9833.141291] block drbd6: helper command: /sbin/drbdadm
before-resync-target minor-6
kernel: [ 9833.158129] block drbd6: helper command: /sbin/drbdadm
before-resync-target minor-6 exit code 0 (0x0)
kernel: [ 9833.158135] block drbd6: conn( WFSyncUUID -> SyncTarget )
disk( Outdated -> Inconsistent )
kernel: [ 9833.158141] block drbd6: Began resync as SyncTarget (will
sync 52 KB [13 bits set]).
kernel: [ 9833.415551] block drbd6: Resync done (total 1 sec; paused 0
sec; 52 K/sec)
kernel: [ 9833.415554] block drbd6: 23 % had equal checksums,
eliminated: 12K; transferred 40K total 52K
kernel: [ 9833.415558] block drbd6: updated UUIDs
E915DF859DCA76C8:0000000000000000:E8E4BDC352C4C580:E8E3BDC352C4C581
kernel: [ 9833.415563] block drbd6: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
kernel: [ 9833.575311] block drbd6: helper command: /sbin/drbdadm
after-resync-target minor-6
crm-unfence-peer.sh[24433]: invoked for drbd-serv-mail
crm-unfence-peer.sh[24433]: WARNING drbd-fencing could not determine the
master id of drbd resource drbd-serv-mail
kernel: [ 9833.615746] block drbd6: helper command: /sbin/drbdadm
after-resync-target minor-6 exit code 1 (0x100)
kernel: [ 9833.661043] block drbd6: bitmap WRITE of 84 pages took 11 jiffies
kernel: [ 9833.772319] block drbd6: 0 KB (0 bits) marked out-of-sync by
on disk bit-map.
kernel: [ 9851.333540] block drbd6: magic?? on data m: 0x80816700 c:
19201 l: 0
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user