Re: [DRBD-user] Occasional split-brain at boot time

digimer Wed, 17 Apr 2019 12:47:29 -0700

On 2019-04-17 12:20 p.m., JCA wrote:

I have a two-node cluster, in the way of two CentOS 7 VMs, as follows:
Cluster name: ClusterOne
Stack: corosync
Current DC: two (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition withquorum
Last updated: Wed Apr 17 09:43:42 2019
Last change: Wed Apr 17 09:39:52 2019 by root via cibadmin on one

2 nodes configured
4 resources configured

Online: [ one two ]

Full list of resources:

 MyAppCluster(ocf::myapps:MyApp):Started one
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ one ]
     Slaves: [ two ]
 DrbdFS(ocf::heartbeat:Filesystem):Started one

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

The DRBD software that I using is the following:

drbd84-utils.x86_64 9.6.0-1.el7.elrepo                 @elrepo
kmod-drbd84.x86_64  8.4.11-1.1.el7_6.elrepo        @elrepo
The nodes have been configured to share an ext4 partition, DrbdFS hasbeen configured to start before MyAppCluster, and the ClusterOnecluster has been configured to start automatically at boot time.
The setup above works, in that node two takes over from node one whenthe latter becomes unreachable, and the DrbdFS filesystemautomatically becomes available to node two, at the correct mountpoint, in that situation.
Now when I reboot one and two, occasionally - but often enough to makeme feel uneasy - DrbdFS comes up in a split-brain condition. Whatfollows are the boot time syslog traces I typically get in such a case:
Apr 17 09:35:59 one pengine[3663]: notice: * Start ClusterOne ( one )Apr 17 09:35:59 one pengine[3663]: notice: * Start DrbdFS ( one )Apr 17 09:35:59 one pengine[3663]: notice: Calculated transition 4,saving inputs in /var/lib/pacemaker/pengine/pe-input-560.bz2Apr 17 09:35:59 one crmd[3664]: notice: Initiating monitor operationDrbdData_monitor_30000 on twoApr 17 09:35:59 one crmd[3664]: notice: Initiating start operationDrbdFS_start_0 locally on oneApr 17 09:35:59 one kernel: drbd myapp-data: Handshake successful:Agreed network protocol version 101Apr 17 09:35:59 one kernel: drbd myapp-data: Feature flags enabled onprotocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.Apr 17 09:35:59 one kernel: drbd myapp-data: conn( WFConnection ->WFReportParams )Apr 17 09:35:59 one kernel: drbd myapp-data: Starting ack_recv thread(from drbd_r_myapp-dat [4406])
Apr 17 09:35:59 one kernel: block drbd1: drbd_sync_handshake:
Apr 17 09:35:59 one kernel: block drbd1: self002DDA8B166FC899:8DD977B102052FD2:
BE3891694D7BCD54:BE3791694D7BCD54 bits:0 flags:0
Apr 17 09:35:59 one kernel: block drbd1: peerD12D1947C4ECF940:8DD977B102052FD2:
BE3891694D7BCD54:BE3791694D7BCD54 bits:32 flags:0
Apr 17 09:35:59 one kernel: block drbd1: uuid_compare()=100 by rule 90
Apr 17 09:35:59 one kernel: block drbd1: helper command: /sbin/drbdadminitial-split-brain minor-1Apr 17 09:35:59 one Filesystem(DrbdFS)[4531]: INFO: Running start for/dev/drbd1 on /var/lib/myappApr 17 09:35:59 one kernel: block drbd1: helper command: /sbin/drbdadminitial-split-brain minor-1 exit code 0 (0x0)Apr 17 09:35:59 one kernel: block drbd1: Split-Brain detected butunresolved, dropping connection!Apr 17 09:35:59 one kernel: block drbd1: helper command: /sbin/drbdadmsplit-brain minor-1Apr 17 09:35:59 one kernel: drbd myapp-data: meta connection shut downby peer.Apr 17 09:35:59 one kernel: drbd myapp-data: conn( WFReportParams ->NetworkFailure )
Apr 17 09:35:59 one kernel: drbd myapp-data: ack_receiver terminated
Apr 17 09:35:59 one kernel: drbd myapp-data: Terminating drbd_a_myapp-dat
Apr 17 09:35:59 one kernel: block drbd1: helper command: /sbin/drbdadmsplit-brain minor-1 exit code 0 (0x0)
Fixing the problem is not difficult, by manual intervention, onceboth nodes are up and running. However, I would like to understand whythe split-brain condition takes sometimes on booting up and, moreimportantly, how to prevent this from happening, if at all possible.
   Suggestions?

Stonith in pacemaker, once tested, fencing in DRBD. This is what fencingis for.


digimer

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Occasional split-brain at boot time

Reply via email to