[DRBD-user] software raid5 - lvm - drbd - nfs (high nfs client cpu load)

Mark Coetser Mon, 11 Nov 2013 05:30:58 -0800

2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics(bond-mode 0).


drbd version 8.3.13-2


kernel 3.2.0-4-amd64

running noop scheduler on both nodes and the following sysctl/disk changes

sysctl -w net.core.rmem_max=131071
sysctl -w net.core.wmem_max=131071
sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192"
sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192"
sysctl -w net.core.netdev_max_backlog=1000
sysctl -w net.ipv4.tcp_congestion_control=reno

for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do
        ifconfig $i txqueuelen 1000
        ethtool -K $i gro off
done

sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_fin_timeout=60

for i in sda sdb sdc sdd; do
        blockdev --setra 1024 /dev/$i
        echo noop > /sys/block/$i/queue/scheduler
        echo 16384 > /sys/block/$i/queue/max_sectors_kb
        echo 1024 > /sys/block/$i/queue/nr_requests
done

mtu 9000 across underlying bonded interfaces as well as the bond, switchconfigured the same.


drbd.conf sync settings


        net {
                allow-two-primaries;
                max-buffers 8192;
                max-epoch-size 8192;
                #unplug-watermark 128;
                #sndbuf-size 0;
                sndbuf-size 512k;
        }
        syncer {
                rate 1000M;
                #rate 24M;
                #group 1;
                al-extents 3833;
                #al-extents 257;
                #verify-alg sha1;
        }


iperf between servers

[  5]  0.0-10.0 sec   388 MBytes   325 Mbits/sec
[  4]  0.0-10.2 sec   356 MBytes   293 Mbits/sec

I am storing users maildirs on the nfs4 share and I have about 5 mailservers mounting the nfs share, under normal conditions things work fineand the servers performance is normal but at certain times there is alarge number of either imap connections or mail that is being written tothe nfs share and when this happens the nfs clients cpu load startsclimbing and access to the nfs share becomes almost unresponsive as soonas I disconnect the secondary node "drbdadm disconnect resource" theload on the nfs clients drops and things start working again. If I thenreconnect the node at a later stage the resource resyncs fairly quicklywith no noticeable load to the servers or clients and things work finefor awhile until either a major read/write. Its difficult for me to tellexactly if its a read/write at the time of the issue.


Any help or pointers would be great.




--
Thank you,

Mark Adrian Coetser
[email protected]
http://www.tux-edo.co.za
http://www.tux-voip.co.za
cel: +27 76 527 8789

 *  This is complicated.  Has to do with interrupts.  Thus, I am
 *  scared witless.  Therefore I refuse to write this function. :-P
                -- From the maclinux patch

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] software raid5 - lvm - drbd - nfs (high nfs client cpu load)

Reply via email to