Hi,

We are using DRBD with a RHEL cluster in two nodes + diskless tiebreaker setup: 
Corosync, Pacemaker and DRBD(node1, node2, tiebreaker-diskless), our versions 
are kmod-drbd-9.0.27 and drbd-utils-9.12.1
We are testing failover scenarios on a physical environment with RedHat 7.6 by 
executing hard shutdowns on node1 and node 2 (we always start from a stable 
state with all nodes connected and up2date).
>From time to time when node2 is shut down (forced) the volume on node1 moves 
>to state Consistent from Up2Date after which (a couple of milliseconds later) 
>the tiebreaker reports an error resulting in the following drbdadm status:

node1
postgres-zabbix-data7790 role:Secondary suspended:quorum
disk:UpToDate quorum:no blocked:upper
node2 connection:Connecting
tiebreaker connection:Connecting

tiebreaker
 postgres-zabbix-data7790 role:Secondary suspended:quorum
  disk:Diskless quorum:no blocked:upper
  node1 role:StandAlone
  node2 connection:Connecting


resource postgres-zabbix-data7790 {

   options {
      quorum majority;
   }
   protocol C;

   startup {
      wfc-timeout 10;
      degr-wfc-timeout 5;
   }
   net {
      max-epoch-size 2048;
      max-buffers 2048;
      sndbuf-size 0;
      rcvbuf-size 0;
   }

   disk {
      on-io-error detach;
      c-max-rate 900M;
      c-min-rate 100M;
      c-fill-target 1M;
      resync-rate 300M;
   }

   on node1 {
      device /dev/drbd7790;
      disk /dev/mapper/lvmdata-postgres--zabbix--data7790;
      node-id 1;
      meta-disk internal;
      address 10.21.24.11:7790;
   }

   on node2 {
      device /dev/drbd7790;
      disk /dev/mapper/lvmdata-postgres--zabbix--data7790;
      meta-disk internal;
      node-id 2;
      address 10.21.24.12:7790;
   }


   on tiebreaker {
      device /dev/drbd7790;
      disk none;
      meta-disk internal;
      node-id 3;
      address 10.21.24.13:7790;
   }

   connection-mesh {
      hosts node1 node2 tiebreaker;
   }
}

Tiebreaker logs:
2021-02-23T23:32:24.299214+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: PingAck did not arrive in time.;
2021-02-23T23:32:24.299244+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.299266+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( Connected -> NetworkFailure ) peer( 
Primary -> Unknown );
2021-02-23T23:32:24.299305+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 node2: pdsk( UpToDate -> DUnknown ) repl( 
Established -> Off );
2021-02-23T23:32:24.299326+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: ack_receiver terminated;
2021-02-23T23:32:24.299349+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Terminating ack_recv thread;
2021-02-23T23:32:24.299372+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.299394+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.318231+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Restarting sender thread;
2021-02-23T23:32:24.318268+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Connection closed;
2021-02-23T23:32:24.318290+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.318312+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( NetworkFailure -> Unconnected );
2021-02-23T23:32:24.318337+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Restarting receiver thread;
2021-02-23T23:32:24.318370+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.318404+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.803484+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Got NegDReply; Sector 0s, len 131072.;
2021-02-23T23:32:24.803557+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.803587+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: State change failed: Need access to UpToDate data;
2021-02-23T23:32:24.803628+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 node1: Failed: pdsk( UpToDate -> Consistent 
);
2021-02-23T23:32:24.803652+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: drbd_req_destroy: Logic BUG rq_state: 
(0:300000, 2:104), completion_ref = 0;
2021-02-23T23:32:24.805509+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: susp-io( no -> quorum);
2021-02-23T23:32:24.805540+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: conn( Connected -> Disconnecting ) peer( 
Secondary -> Unknown );
2021-02-23T23:32:24.805570+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: quorum( yes -> no );
2021-02-23T23:32:24.805599+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 node1: pdsk( UpToDate -> DUnknown ) repl( 
Established -> Off );
2021-02-23T23:32:24.805627+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: error receiving P_STATE, e: -5 l: 0!;
2021-02-23T23:32:24.807322+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: ack_receiver terminated;
2021-02-23T23:32:24.807435+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: Terminating ack_recv thread;
2021-02-23T23:32:24.818354+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: Aborting remote state change 0 commit not 
possible;
2021-02-23T23:32:24.818465+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: Restarting sender thread;
2021-02-23T23:32:24.818542+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: Connection closed;
2021-02-23T23:32:24.818617+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: conn( Disconnecting -> StandAlone );
2021-02-23T23:32:24.818690+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node1: Terminating receiver thread;

Node1 logs:
2021-02-23T23:32:24.800547+00:00;ERR;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: PingAck did not arrive in time.;
2021-02-23T23:32:24.800610+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.800645+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( Connected -> NetworkFailure ) peer( 
Primary -> Unknown );
2021-02-23T23:32:24.800676+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: disk( UpToDate -> Consistent );
2021-02-23T23:32:24.800706+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 node2: pdsk( UpToDate -> DUnknown ) repl( 
Established -> Off );
2021-02-23T23:32:24.800742+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: ack_receiver terminated;
2021-02-23T23:32:24.800771+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Terminating ack_recv thread;
2021-02-23T23:32:24.800800+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.800829+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.800856+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 tiebreaker: receive_peer_dagatg(): 
source-set-bitmap by rule 30;
2021-02-23T23:32:24.800886+00:00;ERR;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Can not satisfy peer's read request, no 
local data.;
2021-02-23T23:32:24.802664+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker 
logic to keep;
2021-02-23T23:32:24.811534+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: sock was shut down by peer;
2021-02-23T23:32:24.811594+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: susp-io( no -> quorum);
2021-02-23T23:32:24.811626+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( Connected -> BrokenPipe ) peer( 
Secondary -> Unknown );
2021-02-23T23:32:24.811662+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: quorum( yes -> no );
2021-02-23T23:32:24.811692+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790 tiebreaker: pdsk( Diskless -> DUnknown ) 
repl( Established -> Off );
2021-02-23T23:32:24.811721+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: ack_receiver terminated;
2021-02-23T23:32:24.811750+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Terminating ack_recv thread;
2021-02-23T23:32:24.811780+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: Preparing cluster-wide state change 3151158077 (1->-1 
0/0);
2021-02-23T23:32:24.811814+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: Aborting cluster-wide state change 3151158077 (9ms) 
rv = -19;
2021-02-23T23:32:24.816641+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Aborting remote state change 0 commit not 
possible;
2021-02-23T23:32:24.816701+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Restarting sender thread;
2021-02-23T23:32:24.816925+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Connection closed;
2021-02-23T23:32:24.816962+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( NetworkFailure -> Unconnected );
2021-02-23T23:32:24.816989+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: Restarting receiver thread;
2021-02-23T23:32:24.817018+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 node2: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.826555+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Aborting remote state change 0 commit not 
possible;
2021-02-23T23:32:24.826622+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Restarting sender thread;
2021-02-23T23:32:24.826651+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Connection closed;
2021-02-23T23:32:24.826681+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( BrokenPipe -> Unconnected );
2021-02-23T23:32:24.826709+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Restarting receiver thread;
2021-02-23T23:32:24.826745+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.912545+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: Preparing cluster-wide state change 1764698033 (1->-1 
0/0);
2021-02-23T23:32:24.912605+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790: Committing cluster-wide state change 1764698033 (0ms);
2021-02-23T23:32:24.912637+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790/0 drbd7790: disk( Consistent -> UpToDate );
2021-02-23T23:32:34.322531+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: sock was shut down by peer;
2021-02-23T23:32:34.322577+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( Connecting -> BrokenPipe );
2021-02-23T23:32:34.322599+00:00;WARNING;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: short read (expected size 8);
2021-02-23T23:32:34.345524+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Aborting remote state change 0 commit not 
possible;
2021-02-23T23:32:34.345582+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Restarting sender thread;
2021-02-23T23:32:34.347540+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: Connection closed;
2021-02-23T23:32:34.347607+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( BrokenPipe -> Unconnected );
2021-02-23T23:32:35.347753+00:00;INFO;node1;P-/;[kernel/]; drbd 
postgres-zabbix-data7790 tiebreaker: conn( Unconnected -> Connecting );

Do you have any advice?

Best Regards,
Mihai

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to