Hi,

it's a lot of information, and I don't have time to go into all details
right now, but from a quick read, here are the things I noticed:

- Why nbproc 64? Your CPU has 18 cores (36 w/ HT), so more procs than that
will likely make performance rather worse. HT cores share the cache, so
using 18 might make most sense (see also below). It's best to experiment a
little with that and measure the results, though.

- If you see ksoftirq eating up a lot of of one CPU, then your box is most
likely configured to process all IRQs on the first core. Most NICs these
days can be configured to use several IRQs, which you can then distribute
across all cores, smoothening the workload across cores significantly.

- Consider using "bind-process" to lock the processes to a single core (but
make sure to leave out the HT cores, or disable HT altogether). Less
context switching, might improve performance)

Hope that helps,
Conrad



On 10/21/2016 04:47 PM, Christian Ruppert wrote:
> Hi,
> 
> again a performance topic.
> I did some further testing/benchmarks with ECC and nbproc >1. I was testing
> on a "E5-2697 v4" and the first thing I noticed was that HAProxy has a
> fixed limit of 64 for nbproc. So the setup:
> 
> HAProxy server with the mentioned E5:
> global
>     user haproxy
>     group haproxy
>     maxconn 75000
>     log 127.0.0.2 local0
>     ssl-default-bind-ciphers
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDH
> 
>     ssl-default-bind-options no-sslv3 no-tls-tickets
>     tune.ssl.default-dh-param 1024
> 
>     nbproc 64
> 
> defaults
>     timeout client 300s
>     timeout server 300s
>     timeout queue 60s
>     timeout connect 7s
>     timeout http-request 10s
>     maxconn 75000
> 
>     bind-process 1
> 
> # HTTP
> frontend haproxy_test_http
>     bind :65410
>     mode http
>     option httplog
>     option httpclose
>     log global
>     default_backend bk_ram
> 
> # ECC
> frontend haproxy_test-ECC
>     bind-process 3-64
>     bind :65420 ssl crt /etc/haproxy/test.pem-ECC
>     mode http
>     option httplog
>     option httpclose
>     log global
>     default_backend bk_ram
> 
> backend bk_ram
>     mode http
>     fullconn 75000 # Just in case the lower default limit will be reached...
>     errorfile 503 /etc/haproxy/test.error
> 
> 
> 
> /etc/haproxy/test.error:
> HTTP/1.0 200
> Cache-Control: no-cache
> Connection: close
> Content-Type: text/plain
> 
> Test123456
> 
> 
> The ECC key:
> openssl ecparam -genkey -name prime256v1 -out /etc/haproxy/test.pem-ECC.key
> openssl req -new -sha256 -key /etc/haproxy/test.pem-ECC.key -days 365
> -nodes -x509 -sha256 -subj "/O=ECC Test/CN=test.example.com" -out
> /etc/haproxy/test.pem-ECC.crt
> cat /etc/haproxy/test.pem-ECC.key /etc/haproxy/test.pem-ECC.crt >
> /etc/haproxy/test.pem-ECC
> 
> 
> So then I tried a local "ab":
> ab -n 5000 -c 250 https://127.0.0.1:65420/
> Server Hostname:        127.0.0.1
> Server Port:            65420
> SSL/TLS Protocol:       TLSv1/SSLv3,ECDHE-ECDSA-AES128-GCM-SHA256,256,128
> 
> Document Path:          /
> Document Length:        107 bytes
> 
> Concurrency Level:      250
> Time taken for tests:   3.940 seconds
> Complete requests:      5000
> Failed requests:        0
> Write errors:           0
> Non-2xx responses:      5000
> Total transferred:      1060000 bytes
> HTML transferred:       535000 bytes
> Requests per second:    1268.95 [#/sec] (mean)
> Time per request:       197.013 [ms] (mean)
> Time per request:       0.788 [ms] (mean, across all concurrent requests)
> Transfer rate:          262.71 [Kbytes/sec] received
> 
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:       54  138  34.7    162     193
> Processing:     8   51  34.8     24     157
> Waiting:        3   40  31.6     18     113
> Total:        177  189   7.5    188     333
> 
> Percentage of the requests served within a certain time (ms)
>   50%    188
>   66%    189
>   75%    190
>   80%    190
>   90%    191
>   95%    192
>   98%    196
>   99%    205
>  100%    333 (longest request)
> 
> The same test with just nbproc 1 was about ~1500 requests/s. So 1,5k *
> nbproc would have been what I expected, at least somewhere near that value.
> 
> Then I setup 61 EC2 instances, standard setup t2-micro. They're somewhat
> slower with ~1k ECC requests per second but that's ok for the test.
> HTTP (one proc) via localhost was around 27-28k r/s, remote (EC2) ~4500.
> 
> So then I started "ab" parallel from each and it was going down to about
> ~4xx requests/s for ECC on each node which is far below the ~1500 (single
> proc) or ~1300 (multi proc) which is much more than I expected tbh. I
> thought it would scale much better for up to nbproc and getting worse when
>>nbproc. I did some basic checks to figure out the reason/bottleneck and to
> me it looks like a lot of switch/epoll_wait. In (h)top it shows that
> ksoftirqd + one haproxy proc is burning 100% cpu of a single core, it's not
> distributed above multiple cores. I'm not sure yet whether it's related to
> the SSL part, HAProxy or some Kernel foo. HTTP performs better. ~27k total
> on localhost, ~5400 single ab via EC2 and still ~2100 per EC2 with a total
> of 15 instances - and the HTTP proc is just a single proc!
> 
> So I wonder what's the reason for the single blocking core. Is that the
> reason for the rather poor performance because it has an impact on any of
> those processes? Can we distribute that onto multiple cores/process as
> well? Any ideas?
> 
> Oh, and I was using 1.6.5:
> HA-Proxy version 1.6.5 2016/05/10
> Copyright 2000-2016 Willy Tarreau <wi...@haproxy.org>
> 
> Build options :
>   TARGET  = linux2628
>   CPU     = generic
>   CC      = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>   OPTIONS = USE_LIBCRYPT=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
> 
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
> 
> Encrypted password support via crypt(3): yes
> Built with zlib version : 1.2.7
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
> Running on OpenSSL version : OpenSSL 1.0.1t  3 May 2016 (VERSIONS DIFFER!)
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports prefer-server-ciphers : yes
> Built with PCRE version : 8.30 2012-02-04
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built without Lua support
> Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
> IP_FREEBIND
> 
> Available polling systems :
>       epoll : pref=300,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
> Total: 3 (3 usable), will use epoll.
> 
> I actually thought I was using 1.6.9 on that host already so I just
> upgraded and tried again some benchmarks but it looks like it's almost
> equal at the first glance.
> 

-- 
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Reply via email to