I have haproxy installed as a load balancer in front of two Exchange 2010 CAS servers for SSL offloading and I am running into significant performance problems (unuseable) after about 1000 concurrent connections. CPU never goes over ~30%, concurrent connections are about ~1800 when it is falling down, memory usage is relatively low. When it is running around 800 everything seems to work fine. Everything works well in testing, it's only when I test moving our production traffic to haproxy do I see problems.
Basically the site stops accepting connections at that point. If I restart haproxy it work but only for a short time before becoming unresponsive. I have looked at various tcp OS optimizations without much hope or any success. A basic count, something like netstat -an| wc -l shows about 58K connections. The only thing I found that I think may be causing this is Outlook Anywhere/RPC over HTTPS. I did not find the option for http-no-delay until after testing so I am wondering if this one setting could cause this type of behaviour? I am assuming it might since connections are hanging until the client timeout. I had not seen this referenced in any of the example exchange 2010 or 2013 configs. I am just wondering if I am on the right track or if anyone else can share their experience with offloading exchange ssl connections including Outlook Anywhere clients. Here are the relevant parts of my config. Note I did NOT have http-no-delay set. This is in place for testing for our next maintenance window. defaults # option http-server-close # set Connection: close to inspect all HTTP traffic option http-keep-alive # This is actually the default and keeps the connection # open to both client and serve option http-no-delay # forward packets immediately, needed for RPC over HTTPS option dontlognull # Do not log connections with no requests option redispatch # Try another server in case of connection failure option contstats # Enable continuous traffic statistics updates retries 3 # Try to connect up to 3 times in case of failure timeout connect 5s # 5 seconds max to connect or to stay in queue timeout client 300s # 5 minute timeout for clients timeout server 300s # 5 minute timeout for servers timeout http-keep-alive 1s # 1 second max for the client to post next request timeout http-request 15s # 15 seconds max for the client to send a request timeout queue 30s # 30 seconds max queued on load balancer timeout tarpit 1m # tarpit hold tim backlog 10000 # Size of SYN backlog queue .... frontend vs_owa_DOMAIN_https bind IP.IP.IP.IP:80 name vs_owa_DOMAIN_http bind IP.IP.IP.IP:443 name vs_owa_DOMAIN_https ssl crt /etc/ssl/certs/email.DOMAIN.org.pem mode http log global option httplog capture request header User-Agent len 64 capture request header Host len 32 option forwardfor # add X-Forwarded-For to headers log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ {%sslv/%sslc/%[ssl_fc_sni]/%[ssl_fc_session_id]}\ %{+Q}r maxconn 5000 http-request redirect scheme https code 302 if !{ ssl_fc } http-request redirect location /owa/ code 302 if { hdr(Host) <WEBMAIL_VIRTUAL_HOST> } { path / } default_backend pool_owa_DOMAIN_http backend pool_owa_DOMAIN_http balance roundrobin mode http log global option prefer-last-server option httplog option forwardfor option redispatch stick-table type ip size 10240k expire 30m stick on src default-server inter 3s rise 2 fall 3 cookie SERVERID insert indirect nocache server SRV1 IP.IP.IP.14:80 maxconn 2000 weight 10 check cookie srv1 server SRV2 IP.IP.IP.26:80 maxconn 2000 weight 10 check cookie srv2