To your immediate problem;
If you had configured fencing, drbd would not split-brain. Are you using
pacemaker or RHCS?
Secondly, 8.3.8 is very, very old. Upgrading to a newer 8.3.x version
would be a good idea.
Back to split-brain; DRBD declares a split-brain as soon as both nodes
are StandAlone and Primary. To recover, you need to tell DRBD which node
to consider "good" and then drop the changes on the peer and let the
good node sync to the other node.
On 04/10/2013 08:08 AM, Shailesh Vaidya wrote:
I have followed same procedure (disable Ethernet card) etc and after
that drbd status on both the nodes
[root@drbd1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
mockbu...@builder10.centos.org, 2010-06-04 08:04:09
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:4 nr:0 dw:12 dr:82 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4
[root@drbd1 ~]#
[root@drbd2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
mockbu...@builder10.centos.org, 2010-06-04 08:04:09
0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r----
ns:0 nr:4 dw:56 dr:42 al:1 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4
[root@drbd2 ~]#
/var/log/messages shows
Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0
Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Apr 10 07:51:35 localhost kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
/sbin/drbdadm split-brain minor-0
Apr 10 07:51:35 localhost kernel: block drbd0: helper command:
/sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Apr 10 07:51:35 localhost kernel: block drbd0: conn( WFReportParams ->
Disconnecting )
Apr 10 07:51:35 localhost kernel: block drbd0: error receiving
ReportState, l: 4!
Apr 10 07:51:35 localhost kernel: block drbd0: asender terminated
Apr 10 07:51:35 localhost kernel: block drbd0: Terminating asender thread
Apr 10 07:51:35 localhost kernel: block drbd0: Connection closed
Apr 10 07:51:35 localhost kernel: block drbd0: conn( Disconnecting ->
StandAlone )
Apr 10 07:51:35 localhost kernel: block drbd0: receiver terminated
Apr 10 07:51:35 localhost kernel: block drbd0: Terminating receiver thread
Now if I do ‘drbdadm connect r0’ on both the machines then,
Apr 10 07:56:37 localhost kernel: block drbd0: uuid_compare()=100 by rule 90
Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0
Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Apr 10 07:56:37 localhost kernel: block drbd0: Split-Brain detected, 1
primaries, automatically solved. Sync from peer node
Apr 10 07:56:37 localhost kernel: block drbd0: peer( Unknown -> Primary
) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Apr 10 07:56:37 localhost kernel: block drbd0: conn( WFBitMapT ->
WFSyncUUID )
Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0
Apr 10 07:56:37 localhost kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Apr 10 07:56:37 localhost kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget ) disk( UpToDate -> Inconsistent )
Apr 10 07:56:37 localhost kernel: block drbd0: Began resync as
SyncTarget (will sync 4 KB [1 bits set]).
Apr 10 07:56:37 localhost kernel: block drbd0: Resync done (total 1 sec;
paused 0 sec; 4 K/sec)
Regards,
Shailesh Vaidya
*From:*drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] *On Behalf Of *Dan Barker
*Sent:* Wednesday, April 10, 2013 5:16 PM
*To:* drbd-user@lists.linbit.com
*Subject:* Re: [DRBD-user] Not able to test Automatic split brain
recovery policies
You don’t show the status of the nodes, but I imagine you have two
primary nodes. There is no handler specified for two primary nodes. Did
you have two primary, disconnected nodes?
It shouldn’t be possible to create split brain without writing on both
nodes.
Dan
*From:*drbd-user-boun...@lists.linbit.com
<mailto:drbd-user-boun...@lists.linbit.com>
[mailto:drbd-user-boun...@lists.linbit.com] *On Behalf Of *Shailesh Vaidya
*Sent:* Wednesday, April 10, 2013 1:58 AM
*To:* drbd-user@lists.linbit.com <mailto:drbd-user@lists.linbit.com>
*Subject:* [DRBD-user] Not able to test Automatic split brain recovery
policies
Hello,
I am using DRBD 8.3.8
I have configured Automatic split brain recovery policies as below in
/etc/drbd.conf
net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}
My both machines are Virtual machines so not connected actual
back-to-back connection. To reproduce split-brain, I am using below
procedure,
1.On Primary disable Ethernet card from ‘Virtual Machine properties’
2.Wait to Secondery to start switch over and again enable Ethernet card
on Primary
Log shows mw that split-brain is occurred , however its shows connection
dropped.
Apr 9 10:30:15 drbd1 kernel: block drbd0: uuid_compare()=100 by rule 90
Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Apr 9 10:30:15 drbd1 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Apr 9 10:30:15 drbd1 kernel: block drbd0: conn( WFReportParams ->
Disconnecting )
Full DRBD conf file
[root@drbd1 ~]# cat /etc/drbd.conf
global {
usage-count no;
}
resource r0 {
protocol C;
#incon-degr-cmd "echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f";
on drbd1 {
device /dev/drbd0;
disk /dev/sda3;
address 10.55.199.51:7789;
meta-disk internal;
}
on drbd2 {
device /dev/drbd0;
disk /dev/sda3;
address 10.55.199.52:7789;
meta-disk internal;
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}
syncer {
rate 25M;
al-extents 257; # must be a prime number
}
startup {
wfc-timeout 20;
degr-wfc-timeout 120; # 2 minutes.
}
}
[root@drbd1 ~]# vi /var/log/messages
[root@drbd1 ~]#
[root@drbd1 ~]# cat /etc/drbd.conf
global {
usage-count no;
}
resource r0 {
protocol C;
#incon-degr-cmd "echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f";
on drbd1 {
device /dev/drbd0;
disk /dev/sda3;
address 10.55.199.51:7789;
meta-disk internal;
}
on drbd2 {
device /dev/drbd0;
disk /dev/sda3;
address 10.55.199.52:7789;
meta-disk internal;
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}
syncer {
rate 25M;
al-extents 257; # must be a prime number
}
startup {
wfc-timeout 20;
degr-wfc-timeout 120; # 2 minutes.
}
}
[root@drbd1 ~]#
Is this configuration issue or my testing procedure is not proper?
Regards,
Shailesh Vaidya
DISCLAIMER ========== This e-mail may contain privileged and
confidential information which is the property of Persistent Systems
Ltd. It is intended only for the use of the individual or entity to
which it is addressed. If you are not the intended recipient, you are
not authorized to read, retain, copy, print, distribute or use this
message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.
DISCLAIMER ========== This e-mail may contain privileged and
confidential information which is the property of Persistent Systems
Ltd. It is intended only for the use of the individual or entity to
which it is addressed. If you are not the intended recipient, you are
not authorized to read, retain, copy, print, distribute or use this
message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user