Hi
I'm using some servers on debian with ganeti and drbd.
Since I've upgraded them to debian 9, and drbd 8.9.10-2 (from debian repo).
I got a lot of issue with my drbd resources, I got randomly on my dmesg some
resources disconnected:
today for example:
[Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn(
Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[Tue Aug 28 14:32:38 2018] drbd resource10: ack_receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_a_resource
[Tue Aug 28 14:32:38 2018] drbd resource10: Connection closed
[Tue Aug 28 14:32:38 2018] drbd resource10: conn( Disconnecting -> StandAlone )
[Tue Aug 28 14:32:38 2018] drbd resource10: receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_r_resource
[Tue Aug 28 14:32:38 2018] block drbd10: disk( UpToDate -> Failed )
[Tue Aug 28 14:32:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on
disk bit-map.
[Tue Aug 28 14:32:38 2018] block drbd10: disk( Failed -> Diskless )
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_w_resource
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting worker thread (from
drbdsetup-84 [10222])
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Diskless -> Attaching )
[Tue Aug 28 14:32:40 2018] drbd resource10: Method to ensure write ordering:
flush
[Tue Aug 28 14:32:40 2018] block drbd10: max BIO size = 262144
[Tue Aug 28 14:32:40 2018] block drbd10: Adjusting my ra_pages to backing
device's (32 -> 256)
[Tue Aug 28 14:32:40 2018] block drbd10: drbd_bm_resize called with capacity ==
314572800
[Tue Aug 28 14:32:40 2018] block drbd10: resync bitmap: bits=39321600
words=614400 pages=1200
[Tue Aug 28 14:32:40 2018] block drbd10: size = 150 GB (157286400 KB)
[Tue Aug 28 14:32:40 2018] block drbd10: recounting of set bits took additional
0 jiffies
[Tue Aug 28 14:32:40 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on
disk bit-map.
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Attaching -> UpToDate )
[Tue Aug 28 14:32:40 2018] block drbd10: attached to UUIDs
0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( StandAlone -> Unconnected )
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting receiver thread (from
drbd_w_resource [10225])
[Tue Aug 28 14:32:40 2018] drbd resource10: receiver (re)started
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( Unconnected -> WFConnection )
[Tue Aug 28 14:32:41 2018] drbd resource10: Handshake successful: Agreed
network protocol version 101
[Tue Aug 28 14:32:41 2018] drbd resource10: Feature flags enabled on protocol
level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[Tue Aug 28 14:32:41 2018] drbd resource10: Peer authenticated using 16 bytes
HMAC
[Tue Aug 28 14:32:41 2018] drbd resource10: conn( WFConnection ->
WFReportParams )
[Tue Aug 28 14:32:41 2018] drbd resource10: Starting ack_recv thread (from
drbd_r_resource [10246])
[Tue Aug 28 14:32:41 2018] block drbd10: drbd_sync_handshake:
[Tue Aug 28 14:32:41 2018] block drbd10: self
0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B bits:0
flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: peer
629F1036CD6CA2AF:0748EE11C429D3B5:FDAEFCD2E8D9890B:FDADFCD2E8D9890B bits:0
flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: uuid_compare()=-1 by rule 50
[Tue Aug 28 14:32:41 2018] block drbd10: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown ->
UpToDate )
[Tue Aug 28 14:32:41 2018] block drbd10: receive bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFBitMapT -> WFSyncUUID )
[Tue Aug 28 14:32:41 2018] block drbd10: updated sync uuid
0749EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true
before-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true
before-resync-target minor-10 exit code 0 (0x0)
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFSyncUUID -> SyncTarget ) disk(
Outdated -> Inconsistent )
[Tue Aug 28 14:32:41 2018] block drbd10: Began resync as SyncTarget (will sync
0 KB [0 bits set]).
[Tue Aug 28 14:32:41 2018] block drbd10: Resync done (total 1 sec; paused 0
sec; 0 K/sec)
[Tue Aug 28 14:32:41 2018] block drbd10: updated UUIDs
629F1036CD6CA2AE:0000000000000000:0749EE11C429D3B4:0748EE11C429D3B5
[Tue Aug 28 14:32:41 2018] block drbd10: conn( SyncTarget -> Connected ) disk(
Inconsistent -> UpToDate )
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true
after-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true
after-resync-target minor-10 exit code 0 (0x0)
I've got 4 differents clusters on these same versions and I got the same
problem on all.
It's not always the same resource.
Any idea what I can check?
Thanks a lot,
Nicolas
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user