On 11/11/2013 15:10, Mark Coetser wrote:
2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics
(bond-mode 0).

drbd version 8.3.13-2

kernel 3.2.0-4-amd64

running noop scheduler on both nodes and the following sysctl/disk changes

sysctl -w net.core.rmem_max=131071
sysctl -w net.core.wmem_max=131071
sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192"
sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192"
sysctl -w net.core.netdev_max_backlog=1000
sysctl -w net.ipv4.tcp_congestion_control=reno

for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do
         ifconfig $i txqueuelen 1000
         ethtool -K $i gro off
done

sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_fin_timeout=60

for i in sda sdb sdc sdd; do
         blockdev --setra 1024 /dev/$i
         echo noop > /sys/block/$i/queue/scheduler
         echo 16384 > /sys/block/$i/queue/max_sectors_kb
         echo 1024 > /sys/block/$i/queue/nr_requests
done


mtu 9000 across underlying bonded interfaces as well as the bond, switch
configured the same.

drbd.conf sync settings


         net {
                 allow-two-primaries;
                 max-buffers 8192;
                 max-epoch-size 8192;
                 #unplug-watermark 128;
                 #sndbuf-size 0;
                 sndbuf-size 512k;
         }
         syncer {
                 rate 1000M;
                 #rate 24M;
                 #group 1;
                 al-extents 3833;
                 #al-extents 257;
                 #verify-alg sha1;
         }


iperf between servers

[  5]  0.0-10.0 sec   388 MBytes   325 Mbits/sec
[  4]  0.0-10.2 sec   356 MBytes   293 Mbits/sec


I am storing users maildirs on the nfs4 share and I have about 5 mail
servers mounting the nfs share, under normal conditions things work fine
and the servers performance is normal but at certain times there is a
large number of either imap connections or mail that is being written to
the nfs share and when this happens the nfs clients cpu load starts
climbing and access to the nfs share becomes almost unresponsive as soon
as I disconnect the secondary node "drbdadm disconnect resource" the
load on the nfs clients drops and things start working again. If I then
reconnect the node at a later stage the resource resyncs fairly quickly
with no noticeable load to the servers or clients and things work fine
for awhile until either a major read/write. Its difficult for me to tell
exactly if its a read/write at the time of the issue.

Any help or pointers would be great.





anyone have any input on this? I modified the syncer/net configs as below which seems to have improved the load issue, I had about an hour of issues yesterday.

        net {
                allow-two-primaries;
                max-buffers 8192;
                max-epoch-size 8192;
                unplug-watermark 128;
                #sndbuf-size 0;
                sndbuf-size 512k;
        }
        syncer {
# Adaptative syncer rate: let DRBD decide the best sync speed
                #   initial sync rate
                rate 100M;
                #   size of the rate adaptation window
                c-plan-ahead 20;
                #   min/max rate
# The network will allow only up to ~110MB/s, but verify and identical-bloc resyncs use very little network BW
                c-max-rate 800M;
# quantity of sync data to maintain in the buffers (impacts the length of the wait queue)
                c-fill-target 100k;

# Limit the bandwidth available for resync on the primary node when DRBD detects application I/O
                c-min-rate 8M;

                al-extents 1023;
                #al-extents 3389;




Thank you,

Mark Adrian Coetser
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to