Hi

I'm using some servers on debian with ganeti and drbd.

Since I've upgraded them to debian 9, and drbd 8.9.10-2 (from debian repo).

I got a lot of issue with my drbd resources, I got randomly on my dmesg some 
resources disconnected:

today for example:

[Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( 
Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: ack_receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_a_resource
[Tue Aug 28 14:32:38 2018] drbd resource10: Connection closed
[Tue Aug 28 14:32:38 2018] drbd resource10: conn( Disconnecting -> StandAlone ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: receiver terminated
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_r_resource
[Tue Aug 28 14:32:38 2018] block drbd10: disk( UpToDate -> Failed ) 
[Tue Aug 28 14:32:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on 
disk bit-map.
[Tue Aug 28 14:32:38 2018] block drbd10: disk( Failed -> Diskless ) 
[Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_w_resource
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting worker thread (from 
drbdsetup-84 [10222])
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Diskless -> Attaching ) 
[Tue Aug 28 14:32:40 2018] drbd resource10: Method to ensure write ordering: 
flush
[Tue Aug 28 14:32:40 2018] block drbd10: max BIO size = 262144
[Tue Aug 28 14:32:40 2018] block drbd10: Adjusting my ra_pages to backing 
device's (32 -> 256)
[Tue Aug 28 14:32:40 2018] block drbd10: drbd_bm_resize called with capacity == 
314572800
[Tue Aug 28 14:32:40 2018] block drbd10: resync bitmap: bits=39321600 
words=614400 pages=1200
[Tue Aug 28 14:32:40 2018] block drbd10: size = 150 GB (157286400 KB)
[Tue Aug 28 14:32:40 2018] block drbd10: recounting of set bits took additional 
0 jiffies
[Tue Aug 28 14:32:40 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on 
disk bit-map.
[Tue Aug 28 14:32:40 2018] block drbd10: disk( Attaching -> UpToDate ) 
[Tue Aug 28 14:32:40 2018] block drbd10: attached to UUIDs 
0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( StandAlone -> Unconnected ) 
[Tue Aug 28 14:32:40 2018] drbd resource10: Starting receiver thread (from 
drbd_w_resource [10225])
[Tue Aug 28 14:32:40 2018] drbd resource10: receiver (re)started
[Tue Aug 28 14:32:40 2018] drbd resource10: conn( Unconnected -> WFConnection ) 
[Tue Aug 28 14:32:41 2018] drbd resource10: Handshake successful: Agreed 
network protocol version 101
[Tue Aug 28 14:32:41 2018] drbd resource10: Feature flags enabled on protocol 
level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[Tue Aug 28 14:32:41 2018] drbd resource10: Peer authenticated using 16 bytes 
HMAC
[Tue Aug 28 14:32:41 2018] drbd resource10: conn( WFConnection -> 
WFReportParams ) 
[Tue Aug 28 14:32:41 2018] drbd resource10: Starting ack_recv thread (from 
drbd_r_resource [10246])
[Tue Aug 28 14:32:41 2018] block drbd10: drbd_sync_handshake:
[Tue Aug 28 14:32:41 2018] block drbd10: self 
0748EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B bits:0 
flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: peer 
629F1036CD6CA2AF:0748EE11C429D3B5:FDAEFCD2E8D9890B:FDADFCD2E8D9890B bits:0 
flags:0
[Tue Aug 28 14:32:41 2018] block drbd10: uuid_compare()=-1 by rule 50
[Tue Aug 28 14:32:41 2018] block drbd10: peer( Unknown -> Primary ) conn( 
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> 
UpToDate ) 
[Tue Aug 28 14:32:41 2018] block drbd10: receive bitmap stats [Bytes(packets)]: 
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: send bitmap stats [Bytes(packets)]: 
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFBitMapT -> WFSyncUUID ) 
[Tue Aug 28 14:32:41 2018] block drbd10: updated sync uuid 
0749EE11C429D3B4:0000000000000000:FDAEFCD2E8D9890A:FDADFCD2E8D9890B
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true 
before-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true 
before-resync-target minor-10 exit code 0 (0x0)
[Tue Aug 28 14:32:41 2018] block drbd10: conn( WFSyncUUID -> SyncTarget ) disk( 
Outdated -> Inconsistent ) 
[Tue Aug 28 14:32:41 2018] block drbd10: Began resync as SyncTarget (will sync 
0 KB [0 bits set]).
[Tue Aug 28 14:32:41 2018] block drbd10: Resync done (total 1 sec; paused 0 
sec; 0 K/sec)
[Tue Aug 28 14:32:41 2018] block drbd10: updated UUIDs 
629F1036CD6CA2AE:0000000000000000:0749EE11C429D3B4:0748EE11C429D3B5
[Tue Aug 28 14:32:41 2018] block drbd10: conn( SyncTarget -> Connected ) disk( 
Inconsistent -> UpToDate ) 
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true 
after-resync-target minor-10
[Tue Aug 28 14:32:41 2018] block drbd10: helper command: /bin/true 
after-resync-target minor-10 exit code 0 (0x0)

I've got 4 differents clusters on these same versions and I got the same 
problem on all.

It's not always the same resource.
        Any idea what I can check?

Thanks a lot,
Nicolas
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to