Re: Tune HAProxy in front of a large k8s cluster

Willy Tarreau Mon, 18 Feb 2019 20:55:52 -0800

Hi Joao,

On Mon, Feb 18, 2019 at 09:31:39PM -0300, Joao Morais wrote:
> 
> 
> > Em 16 de fev de 2019, à(s) 03:16, Willy Tarreau <[email protected]> escreveu:
> > 
> > If you have some time to run some extra tests, it would be nice to rebuild
> > haproxy with "ARCH_FLAGS=-pg", run it again, stop it using kill -USR1 (not
> > ctrl-C), and run "gprof haproxy gmon.out". It will show the number of calls
> > to each function and a rough approximation of the time spent there. We may
> > find a huge number of calls to the culprit and possibly we could improve
> > it.
> 
> Hi Willy, sure. Attached gprof output of a 300rps request during 20s, total
> of 6000 requests. The request is empty (has only trivial headers) and output
> body has only 4 bytes. There are 12 servers on the backend which wait 200ms
> before sending its response.


Thank you! So based on this :

  0.00      0.48     0.00 17186054     0.00     0.00  pattern_exec_match
  0.00      0.48     0.00 16702151     0.00     0.00  sample_process
  0.00      0.48     0.00 12199564     0.00     0.00  acl_exec_cond
  0.00      0.48     0.00 11006144     0.00     0.00  get_trash_chunk
  0.00      0.48     0.00 10691267     0.00     0.00  smp_dup
  0.00      0.48     0.00 10574560     0.00     0.00  smp_fetch_var
  0.00      0.48     0.00  9993833     0.00     0.00  pat_match_str
  0.00      0.48     0.00  8877613     0.00     0.00  lru64_get
  0.00      0.48     0.00  7846046     0.00     0.00  XXH64
  0.00      0.48     0.00  6739347     0.00     0.00  smp_fetch_ssl_fc
  0.00      0.48     0.00  6556205     0.00     0.00  pat_match_nothing

we can say that each request makes 1000 calls to the ssl_fc sample fetch
function, ~1500 accesses to a variable, and ~2000 calls to an ACL. Thus
it typically looks like a long series of :

     use_backend foo if { ssl_fc } { var(req.host) www.example.com }

A long time ago I expected to implement sample caching between rules,
thinking it would save many lookups. But that's exactly what is done
using the variables. It would be possible to slightly optimize this
by making a composite variable made of the SSL state and the host header
field so that you could have lines like this for example :

     use_backend foo if { var(req.host) ssl:www.example.com }

But clearly at this point when you stack thousands of them it's still
expensive to repeat evaluation of these rules eventhough they're all
very cheap. Right now the variables are placed in a linked list by
scope. So if you have few variables, they're fast to look up, but if
you have many of them (say 10 or more), it would be possible that
retrieving each of them takes a bit of time, at which point we should
probably think to place them in a hash table or a tree.

I'm also taking a look at the pattern matching code used with ACLs in
the form above, and it was already optimised so that we don't use a
tree but a list for small lists of inline patterns like this (since
checking a single pattern in a list is cheaper than a tree lookup).
Also, we're retrieving the match from the pattern cache so it's likely
even slightly faster than a bare string compare.

At this point I think that such heavy configs reach their limits and
that the only right solution is the dynamic use_backend (possibly with
a map).

> There are also a lot of other backends and
> servers with health check enabled every 2s consuming some cpu and network.

For this if you have many times the same server you can use the "track"
directive, and only enable checks on a subset of servers and have all
other track them. Typically you'd have a dummy backend dedicated to
checks, and checks disabled in all other backends, replaced with track.

> Note also that I needed to add -no-pie otherwise gprof output was empty --
> sounds a gcc issue. Let me know if this is good enough.

Yes that's fine and the output was perfectly exploitable.

Thanks!
Willy

Re: Tune HAProxy in front of a large k8s cluster

Reply via email to