On Sat, May 12, 2012 at 07:01:19PM +0300, Bar Ziony wrote:
> Hey,
> 
> I have a dynamic backend with maxconn 80 with multiple servers.
> Many times I can see on the haproxy stats page that servers on this backend
> are reaching their maximum 80, but I don't see the number of requests
> currently in queue. The maximum number I ever see is 80. Why is that? Can I
> somehow see the number of requests in the queue?

The queue is split between servers and backend. In the servers' queue, you
only see the requests which absolutely need to be served by the given server
(due to persistence cookie or stick-tables). Otherwise the request lies in
the backend's queue so that it will be served by the first available server.
It's very normal not to have too many requests in the server's queue and have
more in the backend's queue.

> Also, with a munin plugin that checks the HTTP page with ";csv", I see that
> sometimes the graphs shows 400+ req/sec for this backend, which is not
> possible since the maximum is 80...
> Last, what is the difference between "Sessions" and "Session rate" ?

You seem to be really confusing concurrency and rate I'm afraid. Imagine a
highway, it's the same. Session rate is the number of cars you see pass an
observation point each second. Session concurrency is the number of parallel
lanes that are occupied at a given instant. If the traffic slows down, you
need more lanes to drain the same number of cars without slowing the rate
down. If your cars drive faster, you need less lanes for a same cars rate.

Regards,
Willy

> How can I tell when I need another dynamic backend server?

It's simple : observe the total queue size in a backend (backend + sum of
servers). Divide the number by the maxconn and it will tell you the number
of servers that would allow the requests to be processed without queuing.
Note that it's fine to have a bit of queueing, it saves you from buying
more hardware at the expense of a slightly delayed processing. You just
need to ensure the queue is not too deep. The average time spent in the
queue is the average queue size divided by the maxconn and multiplied by
the average response time.

So in order to get an idea :

     srv1 has maxconn 80 and queue around 10
     srv2 has maxconn 80 and queue around 10
     backend has a queue around 100

The total queue is 120, which is the equivalent of 1.5 server. Let's say
you add a single server, you'll then have around 80 requests spread over
the last server, and 40 requests still in the queues. If your servers
exhibit an average response time of 50 ms, the average time spent in the
queue will be 40/80*50 ms = 25ms, so the total response time will increase
from 50ms to 75ms due to the queue. For many sites this will not be
noticeable and probably acceptable. Now if your site is already slow (eg:
2 seconds response time), adding 50% more will give you 3 seconds and your
users will clearly notice the difference.

That's why you first need to maintain the response times as low as possible
by limiting the maxconn, and only then estimate the number of servers needed
to keep the response time low.

Hoping this helps,
Willy


Reply via email to