Re: 100% CPU by epoll loop, while waiting for connection timeout

Tobias Vau Mon, 25 Jul 2016 07:26:18 -0700

Hi Willy,

2016-07-20 21:08 GMT+02:00 Willy Tarreau <w...@1wt.eu>:
> Hi Tobias,
>
> On Thu, Jul 14, 2016 at 04:52:29PM +0200, Tobias Vau wrote:
>> Hi,
>>
>> a small follow up to an older thread from November 2015, where massive
>> numbers of epoll_wait calls lead to 100% CPU consumption.
>>
>> My installation showed the same pattern. As it's also easily
>> reproducible for me with very moderate client traffic (1-10 conns/s),
>> you might maybe be interested in more debug info.
>> [...]
>> As I can very easily reproduce the behaviour with the current config
>> and a very moderate traffic pattern (1-5 conns / s), just let me know,
>> if you'd like to see some other debug info, than what's provided below.
>
> That's very useful.
>
> The detailed session state would be needed. You can have it by either
> requesting "show sess <id>" ex "show sess 0x7fd3aaa28040" below, or
> by issuing "show sess all" which will dump them all (much more useful
> as it allows us to validate a theory across other sessions).


I expected, that you'll ask for these, so I also saved them at that time and
will forward them in private.

>From what I saw on multiple occurences of these conditions, that it seemded to
be the case, that there were always mobile internet connections involved in
this - but this could of course just be unrelated coincidence.

> The problem I've been facing was how to reproduce the condition. If you
> manage to reproduce it within a minute or so at 10 cps, it would be very
> useful to also take a tcpdump capture in parallel of the traffic between
> the client and haproxy and the traffic between haproxy and the server.
> That will help understand what traffic sequence triggers the issue and
> possibly what headers if any is involved. It also allows to eliminate
> some theories based on the configuration.

The test system I had, is already put into production (without the problematic
settings activated). As I already experienced the problems with only a less
important sub-domain routed over that haproxy-instance, I'll try to setup
a second clone of the instance. I just need some spare hours, to re-route that
specific sub-domain over the new clone again and log as much as possible.

As I do not know in advance, which client would cause the fulty conditions,
would two separate multi-minute tcpdumps of

  1) all client traffic to the loadbalancer IP
  2) all haproxy traffic to the backend

still be useful to you? They could then still at least be filtered for one
client IP that owns such a faulty session.

> You need to be fully aware that this will disclose a lot of private
> information so you definitely don't want to post this here. You may want
> to follow up with an anonymized example of a show sess if you want, and/or
> with any possibly relevant new information.

As this single subdomain is only used for delivering some banner images, the
privacy concerns for giving out the dumps would also be lessened a lot. I'd of
course still send these dumps in private.

Kind regards
Tobias

Re: 100% CPU by epoll loop, while waiting for connection timeout

Reply via email to