Hi all,

In our three node, single master DRBD cluster we experienced the following 
situation.



Check state:

root@host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
     disk:UpToDate
     host101011 role:Secondary
       peer-disk:UpToDate
     host101020 role:Secondary
       peer-disk:UpToDate

Everything looks nice.



Let's perform a verification:

root@host101010:/home/admins# drbdadm verify vmXXXX

Running:

root@host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
     disk:UpToDate
     host101011 role:Secondary
       replication:VerifyS peer-disk:UpToDate done:0.09
     host101020 role:Secondary
       replication:VerifyS peer-disk:UpToDate done:0.11



The results are remarkable:

[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify done (total 7112 sec; paused 0 sec; 29484 K/sec) [Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify found 1 4k blocks out of sync!
[Wed May  5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: repl( VerifyS -> 
Established )
[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm out-of-sync [Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm out-of-sync exit code 0 [Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify done (total 7302 sec; paused 0 sec; 28720 K/sec) [Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify found 41790818 4k blocks out of sync!
[Wed May  5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: repl( VerifyS -> 
Established )
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm out-of-sync [Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm out-of-sync exit code 0

Verify claims almost the whole 150G device to be out of sync.


Let's perform disconnect/connect:

[Wed May 5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 1977682193 (0->1 496/16) [Wed May 5 11:44:30 2021] drbd vmXXXX: State change 1977682193: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFA
[Wed May  5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 
1977682193 (0ms)
[Wed May 5 11:44:30 2021] drbd vm1054 host101011: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Wed May 5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Connection closed
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: helper command: 
/sbin/drbdadm disconnected
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: helper command: /sbin/drbdadm disconnected exit code 0
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: conn( Disconnecting -> 
StandAlone )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101011: Terminating receiver thread
[Wed May 5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 2927240878 (0->2 496/16) [Wed May 5 11:44:30 2021] drbd vmXXXX: State change 2927240878: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFE
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Cluster is now split
[Wed May  5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 
2927240878 (0ms)
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Wed May 5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Connection closed
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: helper command: 
/sbin/drbdadm disconnected
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: helper command: /sbin/drbdadm disconnected exit code 0
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: conn( Disconnecting -> 
StandAlone )
[Wed May  5 11:44:30 2021] drbd vmXXXX host101020: Terminating receiver thread
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: conn( StandAlone -> 
Unconnected )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Starting receiver thread (from drbd_w_vmXXXX [38063])
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: conn( Unconnected -> 
Connecting )
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: conn( StandAlone -> 
Unconnected )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Starting receiver thread (from drbd_w_vmXXXX [38063])
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: conn( Unconnected -> 
Connecting )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Handshake to peer 2 successful: Agreed network protocol version 119 [Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May  5 11:44:33 2021] drbd vmXXXX host101020: Peer authenticated using 20 
bytes HMAC
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Starting ack_recv thread (from drbd_r_vmXXXX [16989]) [Wed May 5 11:44:33 2021] drbd vmXXXX: Preparing cluster-wide state change 2251651464 (0->2 499/145) [Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Handshake to peer 1 successful: Agreed network protocol version 119 [Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May  5 11:44:33 2021] drbd vmXXXX host101011: Peer authenticated using 20 
bytes HMAC
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Starting ack_recv thread (from drbd_r_vmXXXX [16987])
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: 
drbd_sync_handshake:
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: self AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:41790818 flags:120 [Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: peer AE3DDC8114ED4F28:0000000000000000:9D5D2097F94E7D30:F96063637CE0E650 bits:41790624 flags:120
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: 
uuid_compare()=no-sync by rule 38
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: bitmap content 
(41790818 bits)
[Wed May 5 11:44:33 2021] drbd vmXXXX: State change 2251651464: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFA
[Wed May  5 11:44:33 2021] drbd vmXXXX: Committing cluster-wide state change 
2251651464 (88ms)
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
[Wed May  5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( Outdated 
-> UpToDate )
[Wed May 5 11:44:34 2021] drbd vmXXXX: Preparing cluster-wide state change 1922267911 (0->1 499/145)
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: 
drbd_sync_handshake:
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: self AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:1 flags:120 [Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: peer AE3DDC8114ED4F28:0000000000000000:E7DC3C8D8FA65BA8:9D5D2097F94E7D30 bits:1 flags:120
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: 
uuid_compare()=no-sync by rule 38
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: bitmap content 
(1 bits)
[Wed May 5 11:44:34 2021] drbd vmXXXX: State change 1922267911: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF8
[Wed May  5 11:44:34 2021] drbd vmXXXX: Committing cluster-wide state change 
1922267911 (48ms)
[Wed May 5 11:44:34 2021] drbd vmXXXX host101011: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
[Wed May  5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( Outdated 
-> UpToDate )


However after disconnect/reconnect drdadmin status vmXXXX did not show any synchronisation activity. Instead state was UpToDate/UpToDate/UpToDate immediately.


How can this be explained?



Debian GNU/Linux 10
drbd 9.0.28-1
drbd-utils 9.15 (from Buster Backports)

global {
     usage-count no;
     udev-always-use-vnr;
}

common {
     handlers {
     }
     startup {
     }
     options {
         auto-promote no;
     }
     disk {
         al-extents 250;
         c-plan-ahead 0;
     }
     net {
         protocol C;
         max-buffers 8000;
         max-epoch-size 8000;
         sndbuf-size 8M;
         cram-hmac-alg "sha1";
         verify-alg "sha1";
     }
}

resource "vmXXXX" {
     volume 0 {
         device      /dev/drbdXXXX0;
         disk        /dev/zvol/disk/data/vmXXXX-0;
         meta-disk   /dev/zvol/disk/meta/vmXXXX-0;
     }
     on host101011 {
         address 10.200.100.11:1XXXX;
         node-id 0;
     }
     on host101010 {
         address 10.200.100.10:1XXXX;
         node-id 1;
     }
     on host101020 {
         address 10.200.100.20:1XXXX;
         node-id 2;
     }
     connection-mesh {
         hosts host101011 host101010 host101020;
     }
     net {
         shared-secret "...";
     }
}


Thanks in advance,

Michael






_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to