On Wed, Jul 17, 2013 at 08:16:18PM +0200, Lukas Tribus wrote:
> Hi Willy,
> 
> 
> > This explains why this only happens for short durations (at most the 
> > duration
> > of a client timeout).
> 
> Good to hear you pinpointed this.
> 
> 
> 
> > What is important is to know that CPU usage aside, there is no loss of
> > information nor service. Connections are correctly handled
> 
> Mark early wrote that in single process mode, when HAProxy hogged the CPU,
> the service was unresponsive, which is why he enabled multi process mode. Do
> you confirm this is not the case?

>From my analysis, this should not be the case. But the issue it still very
tricky to completely get right, and if in the end I found a long dependency
chain, I would not be that surprized. The traces that Mark sent me clearly
show that there is an unhandled event which prevents poll() from waiting,
hence the loop, but all parallel activity is still handled well. That does
not mean that there isn't another bug in parallel.

Also I tend to be careful about issues encountered upon a production issue,
because often we realize after the bug is fixed that the first analysis was
incomplete or wrong due to the emergency of the situation or the stacking
of operations. For example I once faced an issue where haproxy was working
in SSL mode at very high load. After killing it and restarting it it would
go 100% user and not serve any request for a while. I immediately thought
about a bit bug preventing connections from being accepted. In fact it was
simply because we lost all SSL cache information and all incoming connections
had to be renegociated. The machine was on its knees calculating RSA keys
for every incoming request (the machine was undersized for SSL). And indeed,
after some time, everything went smoothly again.

Best regards,
Willy


Reply via email to