Re: [2.2.9] 100% CPU usage
On Fri, Mar 05, 2021 at 12:00:52PM +0100, Christopher Faulet wrote: > Le 05/03/2021 à 11:35, Maciej Zdeb a écrit : > > Hi Christopher, > > > > Thanks, I'll check but it'll take a couple days because the issue is > > quite rare. I'll return with feedback! > > > > Maybe the patch is not backported to 2.2 because of commit message that > > states only 2.3 branch? > > > > That's it. And it was finally backported in 2.2 and 2.1. Note, before 2.4, a single thread can execute Lua scripts at once, with the others waiting behind, and if the Lua load is important, maybe this can happen (but I've never experienced it yet, and the premption interval is short enough not to cause issues in theory). However the trace shows an issue on setjmp(), which doesn't make much sense in theory, unless we consider that it's triggered there because it's the first syscall after waiting too long. Maciej, if this happens often, would you be interested in running one machine on 2.4-dev11 ? We'd need to have a quick look at your config (off-list if needed) to figure what Lua parts could run in multi-thread. Cheers, Willy
Re: "[ANNOUNCE] haproxy-2.3.6
Hi William, On Fri, Mar 05, 2021 at 01:28:34PM +0100, William Dauchy wrote: > Hi, > > On Wed, Mar 3, 2021 at 4:09 PM Christopher Faulet wrote: > >- An issue leading to possible infinite loops because of a double locking > > effect in the mt lists was fixed by Olivier. If MT_LIST_TRY_ADDQ() > > macro, it was possible to try to lock twice the same element, making > > the > > second lock attempt to fail in loop. > > Olivier Houchard (1): > >BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ(). > > not very clear in which conditions it can be triggered. Do you have > more details about it? That's something I encountered while trying to simplify some code, I noticed that under certain circumstances my tests would deadlock, even with a single thread. I seem to remember that it happens with there's exactly one element in the list and you try to add it again into the same list. It just turns out that the rare places where this is used could not trigger this condition. And its sibling, MT_LIST_TRY_ADD() was safe. Hoping this helps, Willy
[ANNOUNCE] haproxy-2.4-dev11
Hi, HAProxy 2.4-dev11 was released on 2021/03/05. It added 60 new commits after version 2.4-dev10. This version got a lot of cleanups for code style, typos, naming, etc, and brings some improvements to the wireshark peers protocol dissector. In addition, that left us some time to start to attack some long-lasting annoying issues that frequently pop up on the issue tracker from people getting trace dumps under many threads. Having had the opportunity to run extended tests on a 8core/16thread then on a 64 core machine allowed us to address another dose of high contention issues. Among them, I can list: - excessive sharing on a few counters updated by the scheduler for stats reporting - excessive sharing of a few lists, such as the list of streams attached to a server in order to honnor "shutdown server sessions" on the CLI. - missing CPU relax calls in the multi-threaded lists, resulting in the situation not to always recover - expensive locking of the idle lists that happened on every I/O wakeup On some test workloads running on 40 to 48 threads, the request rate had increased by a factor of 14-20 and the response time decreased by as much (in fact we were way past the point where CPU was essentially contention). But more importantly, I used to occasionally trigger some watchdog panics under extreme contention on certain lists. Also, thanks to @ngaugler who continues to run some tests in relation to issue #822, now I've become strongly convinced that a number of the occasional reports of panics in socket() or socket_at() when running on many threads were just the outcome of the expensive locking of the idle lists: one of the trace he provided me showed a thread being killed there on the lock after not having done anything that could justify looping, and the link with the socket() call is just that it's the first syscall after these locks, and that it can definitely trigger the check for the CPU timout. For this reason I decided that some of these patches will have to be backported becase some users are facing performance or stability issues under certain situations. The patches were arranged to be easier to backport and a -next branch was created for 2.3 with the backport candidates in it, that survived all tests and showed close to same performance gains. As you can expect, I'm very interested in getting some test reports of this version, especially from those facing occasional issues. In any case, we'll try to emit another 2.3 next week, hopefully with some of these improvements backported. I don't know yet if any of these ones will go to 2.2 though, time will tell. There are still quite some cleanups pending in the todo list and some issues to address but for now we're on the right track, so let's keep up the good work and have all a nice week-end. Please find the usual URLs below : Site index : http://www.haproxy.org/ Discourse: http://discourse.haproxy.org/ Slack channel: https://slack.haproxy.org/ Issue tracker: https://github.com/haproxy/haproxy/issues Wiki : https://github.com/haproxy/wiki/wiki Sources : http://www.haproxy.org/download/2.4/src/ Git repository : http://git.haproxy.org/git/haproxy.git/ Git Web browsing : http://git.haproxy.org/?p=haproxy.git Changelog: http://www.haproxy.org/download/2.4/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ Willy PS: sorry for author "Ubuntu" below, it was me from a test machine, and I've got caught a few times by this: when re-editing the commit message later, the user never appears and I don't see that I need to fix it. It will certainly continue to happen until git commit exposes all fields like a mailer does :-/ Not a big deal anyway. --- Complete changelog : Amaury Denoyelle (7): CLEANUP: backend: fix a wrong comment BUG/MINOR: backend: free allocated bind_addr if reuse conn MINOR: backend: handle reuse for conns with no server as target REGTESTS: test http-reuse if no server target DOC: fix originalto except clause on destination address MINOR: backend: add a BUG_ON if conn mux NULL in connect_server BUG/MINOR: backend: fix condition for reuse on mode HTTP Christopher Faulet (8): BUG/MINOR: tcp-act: Don't forget to set the original port for IPv4 set-dst rule BUG/MINOR: connection: Use the client's dst family for adressless servers BUG/MEDIUM: spoe: Kill applets if there are pending connections and nbthread > 1 DOC: spoe: Add a note about fragmentation support in HAProxy BUG/MINOR: hlua: Don't strip last non-LWS char in hlua_pushstrippedstring() BUG/MINOR: server-state: Don't load server-state file for disabled backends CLEANUP: dns: Use DISGUISE() on a never-failing ring_attach() call CLEANUP: dns: Remove useless test on ns->dgram in dns_connect_nameserver() Frédéric Lécaille (4): BUILD: proxy:
Re: [PATCH] fix some typo
On Thu, Mar 04, 2021 at 11:28:55PM +0500, ??? wrote: > Hello, > > another round of typo cleanup Now applied, thanks Ilya! Willy
Re: "[ANNOUNCE] haproxy-2.3.6
Hi, On Wed, Mar 3, 2021 at 4:09 PM Christopher Faulet wrote: >- An issue leading to possible infinite loops because of a double locking > effect in the mt lists was fixed by Olivier. If MT_LIST_TRY_ADDQ() > macro, it was possible to try to lock twice the same element, making the > second lock attempt to fail in loop. > Olivier Houchard (1): >BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ(). not very clear in which conditions it can be triggered. Do you have more details about it? Thanks, -- William
Re: Logging down output from the a Lua script
On Fri, 5 Mar 2021 at 11:53, Adis Nezirovic wrote: > On 3/4/21 9:47 PM, Mihaly Zachar wrote: > > If I do this: > > applet:set_var('txn.myvar', 'myvar_value') > > > > Then in the HAProxy layer I can reach the variable with %[var(txn.myvar)] > > So it DOES work ! > > But Is this safe ? Did I do it well or I was just lucky ? > > Actions expose 'txn', while services expose full 'applet' object, so I > do think it works as intended, it's not an accident. You are using Lua > service for redirection? > Hi Adis, Ok, thanks for the confirmation. Yes, I did build a small webservice using HAproxy + Lua. Sometimes it does send back 200 Ok with some content, sometimes it sends back 302 based on some logic, it depends on the request. It controls device provisioning. Thanks, Misi
Re: [2.2.9] 100% CPU usage
Le 05/03/2021 à 11:35, Maciej Zdeb a écrit : Hi Christopher, Thanks, I'll check but it'll take a couple days because the issue is quite rare. I'll return with feedback! Maybe the patch is not backported to 2.2 because of commit message that states only 2.3 branch? That's it. And it was finally backported in 2.2 and 2.1. -- Christopher Faulet
Re: Logging down output from the a Lua script
On 3/4/21 9:47 PM, Mihaly Zachar wrote: If I do this: applet:set_var('txn.myvar', 'myvar_value') Then in the HAProxy layer I can reach the variable with %[var(txn.myvar)] So it DOES work ! But Is this safe ? Did I do it well or I was just lucky ? Actions expose 'txn', while services expose full 'applet' object, so I do think it works as intended, it's not an accident. You are using Lua service for redirection? Best regards, -- Adis Nezirovic Software Engineer HAProxy Technologies - Powering your uptime! 375 Totten Pond Road, Suite 302 | Waltham, MA 02451, US +1 (844) 222-4340 | https://www.haproxy.com
Re: [2.2.9] 100% CPU usage
Hi Christopher, Thanks, I'll check but it'll take a couple days because the issue is quite rare. I'll return with feedback! Maybe the patch is not backported to 2.2 because of commit message that states only 2.3 branch? Kind regards, czw., 4 mar 2021 o 22:34 Christopher Faulet napisał(a): > Le 04/03/2021 à 14:01, Maciej Zdeb a écrit : > > Hi, > > > > Sometimes after HAProxy reload it starts to loop infinitely, for example > 9 of 10 > > threads using 100% CPU (gdb sessions attached). I've also dumped the > core file > > from gdb. > > > Hi Maciej, > > The 2.2.1O is out. But I'm afraid that a fix is missing. Could you test > with the > attached patch please ? On top of the 2.2.9 or 2.2.10, as you want. > > Thanks, > -- > Christopher Faulet >