Lars,
Thank you very much for your explanation. In this case, if I had
"connection reset by peer" error, situation becomes more strange.
Actually, I have two resources on this cluster r0 and r1 and I had the
problem with r1 only. If it was communication "hiccup", I'd have a
problem with both resources simultaneously, but I didn't. Split brain
was for r1 only. See my config file below:
global {
usage-count no;
}
common {
protocol C;
}
resource r0 {
device /dev/drbd1;
disk /dev/sdb;
meta-disk internal;
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
ping-timeout 20;
}
startup {
wfc-timeout 100;
degr-wfc-timeout 60;
become-primary-on both;
}
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
on infplsm004 {
address 192.168.10.9:7789;
}
on infplsm005 {
address 192.168.10.10:7789;
}
}
resource r1 {
device /dev/drbd2;
disk /dev/sdc;
meta-disk internal;
# This is to allow dual primary mode.
# http://www.drbd.org/users-guide-emb/s-enable-dual-primary.html
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
ping-timeout 20;
}
startup {
wfc-timeout 100;
degr-wfc-timeout 60;
become-primary-on both;
}
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
on infplsm004 {
address 192.168.10.9:7790;
}
on infplsm005 {
address 192.168.10.10:7790;
}
}
Thank you,
Ivan
On 09/21/2011 10:15 PM, Lars Ellenberg wrote:
On Wed, Sep 21, 2011 at 10:08:42AM +1000, Ivan Pavlenko wrote:
Hi All,
Recently I had split brain onto my cluster. There was a not a big
issue, but I still haven't found any reason of this glitch. I got in
my log dile next:
We call it a DRBD resource internal split brain, when you have a period
in time during which both nodes can not communicate, _and_ both have
been Primary.
Which means, whenever you run dual-primary DRBD, and have a hickup on
the replication link, that causes a DRBD "split brain",
maybe better read that as "potential data-set divergence".
Sep 20 18:44:35 infplsm004<kern.info> kernel: VMCIUtil: Updating
context id from 0x775d2835 to 0x775d2835 on event 0.
Sep 20 18:44:35 infplsm004<kern.err> kernel: block drbd2:
sock_recvmsg returned -104
Sep 20 18:44:35 infplsm004<kern.info> kernel: block drbd2: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk(
UpToDate -> DUnknown )
Sep 20 18:44:35 infplsm004<kern.info> kernel: block drbd2: asender
terminated
Sep 20 18:44:35 infplsm004<kern.info> kernel: block drbd2:
Terminating asender thread
Sep 20 18:44:35 infplsm004<kern.err> kernel: block drbd2: short
read expecting header on sock: r=-512
Sep 20 18:44:35 infplsm004<kern.info> kernel: block drbd2: Creating
new current UUID
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2:
Connection closed
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2: conn(
NetworkFailure -> Unconnected )
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2: receiver
terminated
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2:
Restarting receiver thread
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2: receiver
(re)started
Sep 20 18:44:36 infplsm004<kern.info> kernel: block drbd2: conn(
Unconnected -> WFConnection )
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
Handshake successful: Agreed network protocol version 94
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: conn(
WFConnection -> WFReportParams )
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: Starting
asender thread (from drbd2_receiver [11360])
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
data-integrity-alg:<not-used>
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
drbd_sync_handshake:
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: self
AD9C020C7BA6E149:51B8CD59E67A7227:01C987FB5F84C0D1:30241D96D32A31CF
bits:1 flags:0
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: peer
A2111F74640A099D:51B8CD59E67A7227:01C987FB5F84C0D0:30241D96D32A31CF
bits:0 flags:0
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
uuid_compare()=100 by rule 90
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: helper
command: /sbin/drbdadm initial-split-brain minor-2
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: helper
command: /sbin/drbdadm initial-split-brain minor-2 exit code 0 (0x0)
Sep 20 18:44:38 infplsm004<kern.alert> kernel: block drbd2:
Split-Brain detected but unresolved, dropping connection!
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: helper
command: /sbin/drbdadm split-brain minor-2
Sep 20 18:44:38 infplsm004<kern.err> kernel: block drbd2: meta
connection shut down by peer.
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: conn(
WFReportParams -> NetworkFailure )
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: asender
terminated
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
Terminating asender thread
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: helper
command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0)
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: conn(
NetworkFailure -> Disconnecting )
Sep 20 18:44:38 infplsm004<kern.err> kernel: block drbd2: error
receiving ReportState, l: 4!
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
Connection closed
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: conn(
Disconnecting -> StandAlone )
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2: receiver
terminated
Sep 20 18:44:38 infplsm004<kern.info> kernel: block drbd2:
Terminating receiver thread
I'd like to stress your attention on first two rows. DRBD socket
received messages is code -104. What's it for? Where I can get info
about error codes?
These are typically normal negative errno codes,
on my box 104 would be ECONNRESET, Connection reset by peer.
Thank you in advance,
Ivan
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user