Re: %usr/sys & nbproc

Chris Burroughs Tue, 26 Nov 2013 04:25:47 -0800

On 11/23/2013 04:13 AM, Willy Tarreau wrote:

This is 25% user and 75% system. It's on the high side for the user, since
you generally get between 15 and 25% user for 75-85% system, but since you
have logs enabled, it's not really surprizing so yes it's in the norm. You
should be able to slightly improve this by using "http-server-close" instead
of "httpclose". It will actively close server-side connections and save a
few packets.

My understanding was that HAProxy 1.4 does not formally support havingpersistent connections to backends while closing connections to clients.However, if the backend servers used keep alive and HAProxy did notforce the connection close that this would likely work. I thought thatcontinuing to shove bytes over a small number of existing TCP connectionought to be cheaper (in terms of packets, interprets, %sys, etc) thansetting up and tearing down yet more sockets.

While I don't have an empirical basis for comparison on this
older hardware, 20k req/s also "seemed" low.


I remember having benchmarked another recent opteron last year (with
many cores, a 7-something) and it performed very poorly, about 18k/s,
much worse than my 4-years old Phenom 9950. One of the reasons was
that it was difficult to share some L3 cache between certain cores.
I found little information on the 4184, except on Wikipedia (to be
taken with a grain of salt) :

    http://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29

Thus it probably suffers from the same design as the 7xxx, which is
that you need to identify the cores belonging to the same module, so
that they share the same L2 cache and that they are located in the
same part of the share L3 cache, otherwise the inter-core communications
happen via the outside.

As far as I can tell from AMD docs and Vincent's handy /sys trick, eachof the 6 cores has a fully independent L2 cache, and the chip has asingle shared L3 cache.

I'm not sure I'm following the part about the "same part of the L3cache". Are you saying that some cores are "closer" to each other onthe L3 cache, like NUMA?

These CPUs seem to be designed for VM hosting, or running highly
threaded Java apps which don't need much FPU. I'm not certain they
were optimized for network processing unfortunately, which is sad
considering that their older brothers were extremely fast at that.

"Highly threaded Java apps" happens to be what most of our servers areused for and what we benchmarked for purchasing decisions.

Finally assuming the single process performance can not be further
improved I was considering the following setup:
  * core 0: eht0 interrupts
  * core 1: haproxy bound to eth0
  * core 2: eth1 interrupts
  * core 3: haproxy bound to eth1
  * core 4-5: ssl terminator


I definitely agree. I know at least one setup which runs fine this way.
It was a two-socket system, each with its own NIC and process. But here
you're in the same situation, consider that you have 3 independant CPUs
in the same box. The benefit of doing it this way is that you can still
parallelize network interrupts to multiple cores without having the
response traffic come to the wrong core (proxies are a hell to optimize
because of their two sides).

This setup (haproxy per NIC) was able to handle 50% more load than asingle haproxy. So from about 20k req/s to 30k. This is very nice bumpwith with what would otherwise be mostly idle cpu cores. We found thisto be very complex to setup at the IP layer though (which isn'thaproxy's fault but in our particular circumstances might not be worth it).

But I could not find too many examples of similar setups and was unsure
if it was a viable long term configuration.


Yes it is viable. The only limit right now is that you'll need to start
two processes. In the future, when listeners reliably support the
"bind-process" keyword, it will even be possible to centralize
everything and have a dedicated stats socket for each.

In the mean time I suggest that you have two processes with almost the
same config except interfaces. Note that haproxy supports binding to
interfaces.

For reasons that could be completely incidental to our networking, I wasunable to get "bind *:80 interface eth0" to consistently work and had todo "bind $IP:80 interface eth0". With the first one the instance boundto eth0 would answer requests that were coming on on eth1.

Otherwise, all your config below looks fine.

Thank you for looking. I and several of my colleagues have found thisthread most helpful.

Re: %usr/sys & nbproc

Reply via email to