So I'm still debugging these potentially slow backends, but I've got
some pretty wildly ranging times showing up in haproxy log.

Some clients are really slow to send their request headers:
945/0/1/2/948

Some backends are really slow to send their response:
0/0/0/1556/1556

More background is that the clients are our scala web app making
requests to a backend http service through haproxy. Scala, as many
might know, runs on top of the JVM. A collegue of mine theorizes that
garbage collection runs might get in the way of these requests thereby
explaining the extremely slow request headers. We'll be looking into
tuning the clients somehow, whether it be GC tuning or simply bulking
up the pool of servers to reduce GC overhead per app server.

Is it possible that the slow backends as reported by haproxy are not
actually slow, but that the clients are holding up the conversation
and not pulling the request through haproxy? Does haproxy fully buffer
the response like nginx can? Is there some tuning I can do to help to
prove that the clients are slow and not the backends?

On Tue, Jun 21, 2011 at 2:15 PM, Willy Tarreau <w...@1wt.eu> wrote:
> Hi David,
>
> On Tue, Jun 21, 2011 at 02:04:50PM -0700, David Birdsong wrote:
>> I'm in the process of trying to debug a somewhat sluggish set of
>> backends. The backends are ~20 python tornado web servers that
>> implement a simple blocking db call to mongodb. I would theorize that
>> the request rate can overload the number of backends and their ability
>> to service periodically when outside forces slow the app down slightly
>> and in turn cause connections to sit in each app servers listen queue
>> briefly while it clears out requests.
>>
>> I'm pretty sure halog can help me figure this out for certain, but I
>> can't seem to either invoke the correct cmd line args or I'm not
>> comprehending the output properly.
>>
>> For example:
>>
>> <CMD TO PIPE ~3k relavent requests lines>| halog -ad 10 gives me lines like:
>> 21:02:21.020 75741020 11 1
>> 21:02:21.050 75741050 13 1
>> 21:02:21.313 75741313 17 1
>> 21:02:21.522 75741522 10 1
>> 21:02:21.549 75741549 13 1
>> 21:02:21.661 75741661 11 2
>> 21:02:21.704 75741704 12 1
>> 21:02:21.762 75741762 15 1
>>
>>
>> I expect this to mean:  'filter out any lines that indicate an accept
>> time below 10ms and show me anything greater'. However -ad is an input
>> filter, so I have no idea what the output means.
>
> No, the -ad is one of the few hacks that was initially developped in halog
> for a specific purpose. It reports to you at what dates there was a hole
> without any requests that latested more than 10 ms, and the number of
> requests which suddenly happened after the hole. It helped spot system
> issues on a machine running haproxy.
>
> What you should use are -srv (report per-server stats), -st (stats on
> status codes), -tc (stats on termination codes), and -pct (outputs
> percentiles of connect times, response times and data times). You can
> also make use of -u (report stats by URL, and optionally sort by average
> time, errors, etc...).
>
> Do not hesitate to add new features to halog, each time I had to use it
> on a serious problem, I found that some filtering or output capability
> was missing and added it.
>
> If you need more assistance on precise log extracts, please send them
> off-list.
>
> Regards,
> Willy
>
>

Reply via email to