Hi,
Thanks for you reply.
If I modify the configuration like this on the global_common :
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
# The following 3 handlers were disabled due to #576511.
# Please check the DRBD manual and enable them, if they
make sense in your setup.
# pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
# pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
# local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger
; halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target
"/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target
/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout
wait-after-sb
}
disk {
# on-io-error fencing use-bmbv no-disk-barrier
no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
net {
ko-count 2
timeout 50
# sndbuf-size rcvbuf-size timeout connect-int ping-int
ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries
cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri
data-integrity-alg no-tcp-cork
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg
csums-alg
}
}
If I make this config one the secondary node, I can have a proper
disconnection on the slave when we ave HW problems like on my previous
post ?
Thanks
Matthieu Lejeune
Le 5/03/14 11:32, Philip Gaw a écrit :
Hi Matthieu,
On 05/03/2014 07:29, Matthieu Lejeune wrote:
Hi all,
I had a problem this night with a DRBD Primary/Slave.
The slave experienced a hardware issue (LSI controller freezed).
It seems the master hold I/O waiting for the slave to respond until
timeout.
This caused all targets exported trough infiniband to be
disconnected from the master.
So, practically, the master stop responding due to a failure on the
slave.
I had to hard reboot (power cycle) the slave because UDEV wasn't
responding and did not allow normal reboot.
After slave reboot, drdb did reconnect. It was in status pri/sec
uptodate/uptodate.
But the LSI controller immediatly timeout causing the same issue a
second time.
How can we prevent issue on the slave to impact the master ?
have a look at ko-count
|ko-count/|number|/|
In case the secondary node fails to complete a single write
request for/|count|/times the/|timeout|/, it is expelled from the
cluster. (I.e. the primary node goes into|StandAlone|mode.) The
default value is 0, which disables this feature.
http://www.drbd.org/users-guide/re-drbdconf.html
Thank you.
Matthieu Lejeune
drbd8-utils : 2:8.3.13-2
amd64 RAID 1 over tcp/ip for Linux utilities
Debian :
root@ifprdstor8a:~/trunk# cat /proc/version
Linux version 3.2.0-4-amd64 ([email protected]) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.51-1
root@ifprdstor8a:~/trunk#
We are using the scst/srpt with the Trunk version of the 7 January 2014
I give you the config :
*drbd global : **
*
root@ifprdstor8a:/etc/drbd.d# cat global_common.conf
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
# The following 3 handlers were disabled due to #576511.
# Please check the DRBD manual and enable them, if they make
sense in your setup.
# pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f";
# pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f";
# local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target
"/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target
/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout
wait-after-sb
}
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int
ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg
shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri
data-integrity-alg no-tcp-cork
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
}
}
*Ressource Configuration : *
root@ifprdstor8a:/etc/drbd.d# cat DSA801.res
resource DSA801 {
protocol C;
startup {
wfc-timeout 0;
}
disk {
on-io-error detach;
}
syncer {
rate 400M;
verify-alg md5;
}
on ifprdstor8a {
device /dev/drbd1;
disk /dev/sda;
address 10.13.1.5:7788;
meta-disk internal;
}
on ifprdstor8b {
device /dev/drbd1;
disk /dev/sda;
address 10.13.1.6:7788;
meta-disk internal;
}
}
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user