Hi,

I'm using haproxy on centos 6.5 KVM virtual machines to loadbalance
some ldap traffic. Both virtual servers (haproxy and ldap server) are
running on the same KVM host (for testing I disabled the other ldap
servers in the balanced setup).
Now I'm seeing in the error logs all the time connection resets during
transfer (SD state), even while the traffic is really low (still in
testing phase). I repackaged 1.5.3 for centos to see if this resolves
the issue, but it does not. I should add that the ldap client seems to
be working just fine.
I already tried playing with sysctl settings, but to no avail, so I
could use some help here :-)

Some info, settings and logs:

My basic haproxy config:
global
    daemon
    maxconn 4096
    log         127.0.0.1 local3
    stats socket    /tmp/haproxy

defaults
    #timeout connect 5000ms
    #timeout client 50000ms
    #timeout server 50000ms
    timeout connect 3s
    #timeout client 5m
    #timeout server 5m
    timeout client 3605s
    timeout server 3605s
    log    global
    option tcplog
    option dontlog-normal
    option redispatch

# Ldap
listen in-389
    maxconn 4000
    bind 172.18.235.96:389
    mode tcp
    balance leastconn
    option ldap-check
    option log-health-checks
    option srvtcpka
    server server1 a.b.c.d:389 maxconn 1024 weight 50 check
    server server2 e.f.g.h:389 maxconn 1024 weight 50 check
    server server3 i.j.k.l:389    maxconn 1024 weight 10 check
    server server4 m.n.o.p:389    maxconn 1024 weight 10 check


echo "show info" | socat unix-connect:/tmp/haproxy stdio
Name: HAProxy
Version: 1.5.3
Release_date: 2014/07/25
Nbproc: 1
Process_num: 1
Pid: 28679
Uptime: 0d 0h06m03s
Uptime_sec: 363
Memmax_MB: 0
Ulimit-n: 8233
Maxsock: 8233
Maxconn: 4096
Hard_maxconn: 4096
CurrConns: 55
CumConns: 538
CumReq: 557
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 1
ConnRateLimit: 0
MaxConnRate: 28
SessRate: 1
SessRateLimit: 0
MaxSessRate: 28
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 69
Run_queue: 1
Idle_pct: 100

[root@bpmgt0002lb a540208]# echo "show pools" | socat
unix-connect:/tmp/haproxy stdio
Dumping pools usage. Use SIGQUIT to flush them.
  - Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
  - Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  - Pool channel (80 bytes) : 116 allocated (9280 bytes), 114 used, 1
users [SHARED]
  - Pool task (112 bytes) : 71 allocated (7952 bytes), 70 used, 1 users [SHARED]
  - Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  - Pool connection (320 bytes) : 116 allocated (37120 bytes), 113
used, 1 users [SHARED]
  - Pool hdr_idx (416 bytes) : 2 allocated (832 bytes), 1 used, 1 users [SHARED]
  - Pool session (864 bytes) : 58 allocated (50112 bytes), 57 used, 1
users [SHARED]
  - Pool requri (1024 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  - Pool buffer (16416 bytes) : 116 allocated (1904256 bytes), 114
used, 1 users [SHARED]
Total: 10 pools, 2009712 bytes allocated, 1974368 used.

echo "show stat" | socat unix-connect:/tmp/haproxy stdio (disabled
some backends for easier testing):
in-389,FRONTEND,,,57,60,4000,1050,1109359,55405656,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,28,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
in-389,server1,0,0,51,54,1024,1032,1096497,55364671,,0,,0,833,0,0,UP,50,1,0,0,0,1219,0,,1,2,1,,1032,,2,1,,28,L7OK,0,0,,,,,,,0,,,,0,833,,,,,1,Success,,0,0,0,12129,
in-389,server2,0,0,4,4,1024,12,8120,26848,,0,,0,8,0,0,MAINT,50,1,0,0,1,1174,1174,,1,2,2,,12,,2,0,,3,L7OK,0,0,,,,,,,0,,,,0,8,,,,,1179,,,0,0,0,2,
in-389,server3,0,0,1,1,1024,2,1356,3510,,0,,0,0,0,0,MAINT,10,1,0,0,1,1174,1174,,1,2,3,,2,,2,0,,1,L7OK,0,7,,,,,,,0,,,,0,0,,,,,1218,,,0,1,0,1,
in-389,server4,0,0,1,1,1024,4,3386,10627,,0,,0,2,0,0,MAINT,10,1,0,0,1,1174,1174,,1,2,4,,4,,2,0,,1,L7OK,0,7,,,,,,,0,,,,0,2,,,,,1215,,,0,1,0,2,
in-389,BACKEND,0,0,57,60,400,1050,1109359,55405656,0,0,,0,843,0,0,UP,50,1,0,,0,1219,0,,1,2,0,,1050,,1,1,,28,,,,,,,,,,,,,,0,843,0,0,0,0,1,,,0,0,0,12130,

My extra sysctl settings on the virtual loadbalancer:
net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 5000
net.core.netdev_max_backlog = 5000
net.ipv4.ip_local_port_range = 1025 65000
net.core.rmem_max=12582912
net.core.wmem_max=12582912
net.ipv4.tcp_rmem= 10240 87380 12582912
net.ipv4.tcp_wmem= 10240 87380 12582912

Some example loglines:
[10/Sep/2014:15:21:52.236] in-389 in-389/server1 1/0/281 3356 SD
58/57/57/51/0 0/0
[10/Sep/2014:15:22:04.211] in-389 in-389/server1 1/0/281 3356 SD
58/57/57/51/0 0/0

The output of "netstat -i" shows no errors on both haproxy and ldap
servers (virtio network driver is being used).
However, the tcp resets are showing up in netstat.

For the haproxy server:
Tcp:
    1634017 active connections openings
    910828 passive connection openings
    732 failed connection attempts
    186392 connection resets received
    126 connections established
    165215192 segments received
    158185858 segments send out
    194179 segments retransmited
    426 bad segments received.
    1238664 resets sent
Udp:
    734717 packets received
    173 packets to unknown port received.
    0 packet receive errors
    1520025 packets sent
UdpLite:
TcpExt:
    4639 invalid SYN cookies received
    234010 TCP sockets finished time wait in fast timer
    481844 delayed acks sent
    5124 delayed acks further delayed because of locked socket
    Quick ack mode was activated 6322 times
    618224 packets directly queued to recvmsg prequeue.
    1629109 packets directly received from backlog
    47034427 packets directly received from prequeue
    96825479 packets header predicted
    49243 packets header predicted and directly queued to user
    7643396 acknowledgments not containing data received
    54024702 predicted acknowledgments
    1868 times recovered from packet loss due to fast retransmit
    68518 times recovered from packet loss due to SACK data
    Detected reordering 24 times using FACK
    Detected reordering 22 times using SACK
    TCPDSACKUndo: 875
    2874 congestion windows recovered after partial ack
    73099 TCP data loss events
    TCPLostRetransmit: 1588
    20 timeouts after reno fast retransmit
    620 timeouts after SACK recovery
    202 timeouts in loss state
    147422 fast retransmits
    9739 forward retransmits
    12133 retransmits in slow start
    14567 other TCP timeouts
    TCPRenoRecoveryFail: 372
    373 sack retransmits failed
    6260 DSACKs sent for old packets
    3299 DSACKs received
    876798 connections reset due to unexpected data
    183198 connections reset due to early user close
    685 connections aborted due to timeout

And for the ldap server:
Tcp:
    21001 active connections openings
    1263366 passive connection openings
    297 failed connection attempts
    948390 connection resets received
    61 connections established
    48712376 segments received
    73981815 segments send out
    7113 segments retransmited
    19 bad segments received.
    478141 resets sent
Udp:
    176452 packets received
    86 packets to unknown port received.
    0 packet receive errors
    1055678 packets sent
UdpLite:
TcpExt:
    15894 invalid SYN cookies received
    48 resets received for embryonic SYN_RECV sockets
    17739 TCP sockets finished time wait in fast timer
    319947 delayed acks sent
    6394 delayed acks further delayed because of locked socket
    Quick ack mode was activated 23 times
    68065 packets directly queued to recvmsg prequeue.
    10146 packets directly received from backlog
    513207 packets directly received from prequeue
    5543973 packets header predicted
    426 packets header predicted and directly queued to user
    4391742 acknowledgments not containing data received
    35677710 predicted acknowledgments
    1688 times recovered from packet loss due to SACK data
    TCPDSACKUndo: 20
    1103 congestion windows recovered after partial ack
    1578 TCP data loss events
    TCPLostRetransmit: 14
    55 timeouts after SACK recovery
    1 timeouts in loss state
    2936 fast retransmits
    135 forward retransmits
    30 retransmits in slow start
    2058 other TCP timeouts
    1 sack retransmits failed
    23 DSACKs sent for old packets
    70 DSACKs received
    2304 connections reset due to unexpected data
    457369 connections reset due to early user close
    30 connections aborted due to timeout

Any hints are very much appreciated. If more info is needed, let me know.

Franky

Reply via email to