Hi,

I tried some of the network tuning suggestions made by other people on this 
list who faced similar problems:

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.rmem_default = 87380
net.core.wmem_default = 65536
net.core.optmem_max = 25165824
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_moderate_rcvbuf = 0
net.ipv4.tcp_synack_retries = 2
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_max_syn_backlog = 252144
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_slow_start_after_idle = 0

Some of these seemed to help a bit as it would take longer to trigger the issue 
(about 10-15 minutes instead of 5 minutes).  However what ultimately solved the 
issue was upgrading the kernel to 3.19.  I also had to update the Mellanox 
drivers at the same time, so I guess it could have been that.  Anyway I have 
now done a "rados bench write" for over 100 minutes with no sign of the issue.

Thanks,
Brendan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to