Re: [2.2.9] 100% CPU usage

2021-03-05 Thread Willy Tarreau
On Fri, Mar 05, 2021 at 12:00:52PM +0100, Christopher Faulet wrote:
> Le 05/03/2021 à 11:35, Maciej Zdeb a écrit :
> > Hi Christopher,
> > 
> > Thanks, I'll check but it'll take a couple days because the issue is
> > quite rare. I'll return with feedback!
> > 
> > Maybe the patch is not backported to 2.2 because of commit message that
> > states only 2.3 branch?
> > 
> 
> That's it. And it was finally backported in 2.2 and 2.1.

Note, before 2.4, a single thread can execute Lua scripts at once,
with the others waiting behind, and if the Lua load is important, maybe
this can happen (but I've never experienced it yet, and the premption
interval is short enough not to cause issues in theory). However the
trace shows an issue on setjmp(), which doesn't make much sense in
theory, unless we consider that it's triggered there because it's the
first syscall after waiting too long. Maciej, if this happens often,
would you be interested in running one machine on 2.4-dev11 ? We'd
need to have a quick look at your config (off-list if needed) to
figure what Lua parts could run in multi-thread.

Cheers,
Willy



Re: "[ANNOUNCE] haproxy-2.3.6

2021-03-05 Thread Willy Tarreau
Hi William,

On Fri, Mar 05, 2021 at 01:28:34PM +0100, William Dauchy wrote:
> Hi,
> 
> On Wed, Mar 3, 2021 at 4:09 PM Christopher Faulet  wrote:
> >- An issue leading to possible infinite loops because of a double locking
> >  effect in the mt lists was fixed by Olivier. If MT_LIST_TRY_ADDQ()
> >  macro, it was possible to try to lock twice the same element, making 
> > the
> >  second lock attempt to fail in loop.
> > Olivier Houchard (1):
> >BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ().
> 
> not very clear in which conditions it can be triggered. Do you have
> more details about it?

That's something I encountered while trying to simplify some code, I
noticed that under certain circumstances my tests would deadlock, even
with a single thread. I seem to remember that it happens with there's
exactly one element in the list and you try to add it again into the
same list. It just turns out that the rare places where this is used
could not trigger this condition. And its sibling, MT_LIST_TRY_ADD()
was safe.

Hoping this helps,
Willy



[ANNOUNCE] haproxy-2.4-dev11

2021-03-05 Thread Willy Tarreau
Hi,

HAProxy 2.4-dev11 was released on 2021/03/05. It added 60 new commits
after version 2.4-dev10.

This version got a lot of cleanups for code style, typos, naming, etc,
and brings some improvements to the wireshark peers protocol dissector.

In addition, that left us some time to start to attack some long-lasting
annoying issues that frequently pop up on the issue tracker from people
getting trace dumps under many threads. Having had the opportunity to
run extended tests on a 8core/16thread then on a 64 core machine allowed
us to address another dose of high contention issues. Among them, I can
list:
  - excessive sharing on a few counters updated by the scheduler for
stats reporting
  - excessive sharing of a few lists, such as the list of streams attached
to a server in order to honnor "shutdown server sessions" on the CLI.
  - missing CPU relax calls in the multi-threaded lists, resulting in the
situation not to always recover
  - expensive locking of the idle lists that happened on every I/O wakeup

On some test workloads running on 40 to 48 threads, the request rate had
increased by a factor of 14-20 and the response time decreased by as much
(in fact we were way past the point where CPU was essentially contention).
But more importantly, I used to occasionally trigger some watchdog panics
under extreme contention on certain lists. Also, thanks to @ngaugler who
continues to run some tests in relation to issue #822, now I've become
strongly convinced that a number of the occasional reports of panics in
socket() or socket_at() when running on many threads were just the outcome
of the expensive locking of the idle lists: one of the trace he provided
me showed a thread being killed there on the lock after not having done
anything that could justify looping, and the link with the socket() call
is just that it's the first syscall after these locks, and that it can
definitely trigger the check for the CPU timout.

For this reason I decided that some of these patches will have to be
backported becase some users are facing performance or stability issues
under certain situations. The patches were arranged to be easier to
backport and a -next branch was created for 2.3 with the backport
candidates in it, that survived all tests and showed close to same
performance gains.

As you can expect, I'm very interested in getting some test reports of
this version, especially from those facing occasional issues. In any case,
we'll try to emit another 2.3 next week, hopefully with some of these
improvements backported. I don't know yet if any of these ones will go to
2.2 though, time will tell.

There are still quite some cleanups pending in the todo list and some
issues to address but for now we're on the right track, so let's keep up
the good work and have all a nice week-end.

Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/2.4/src/
   Git repository   : http://git.haproxy.org/git/haproxy.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy.git
   Changelog: http://www.haproxy.org/download/2.4/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Willy

PS: sorry for author "Ubuntu" below, it was me from a test machine, and
I've got caught a few times by this: when re-editing the commit
message later, the user never appears and I don't see that I need to
fix it. It will certainly continue to happen until git commit exposes
all fields like a mailer does :-/ Not a big deal anyway.

---
Complete changelog :
Amaury Denoyelle (7):
  CLEANUP: backend: fix a wrong comment
  BUG/MINOR: backend: free allocated bind_addr if reuse conn
  MINOR: backend: handle reuse for conns with no server as target
  REGTESTS: test http-reuse if no server target
  DOC: fix originalto except clause on destination address
  MINOR: backend: add a BUG_ON if conn mux NULL in connect_server
  BUG/MINOR: backend: fix condition for reuse on mode HTTP

Christopher Faulet (8):
  BUG/MINOR: tcp-act: Don't forget to set the original port for IPv4 
set-dst rule
  BUG/MINOR: connection: Use the client's dst family for adressless servers
  BUG/MEDIUM: spoe: Kill applets if there are pending connections and 
nbthread > 1
  DOC: spoe: Add a note about fragmentation support in HAProxy
  BUG/MINOR: hlua: Don't strip last non-LWS char in 
hlua_pushstrippedstring()
  BUG/MINOR: server-state: Don't load server-state file for disabled 
backends
  CLEANUP: dns: Use DISGUISE() on a never-failing ring_attach() call
  CLEANUP: dns: Remove useless test on ns->dgram in dns_connect_nameserver()

Frédéric Lécaille (4):
  BUILD: proxy: 

Re: [PATCH] fix some typo

2021-03-05 Thread Willy Tarreau
On Thu, Mar 04, 2021 at 11:28:55PM +0500,  ??? wrote:
> Hello,
> 
> another round of typo cleanup

Now applied, thanks Ilya!
Willy



Re: "[ANNOUNCE] haproxy-2.3.6

2021-03-05 Thread William Dauchy
Hi,

On Wed, Mar 3, 2021 at 4:09 PM Christopher Faulet  wrote:
>- An issue leading to possible infinite loops because of a double locking
>  effect in the mt lists was fixed by Olivier. If MT_LIST_TRY_ADDQ()
>  macro, it was possible to try to lock twice the same element, making the
>  second lock attempt to fail in loop.
> Olivier Houchard (1):
>BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ().

not very clear in which conditions it can be triggered. Do you have
more details about it?

Thanks,

-- 
William



Re: Logging down output from the a Lua script

2021-03-05 Thread Mihaly Zachar
On Fri, 5 Mar 2021 at 11:53, Adis Nezirovic  wrote:

> On 3/4/21 9:47 PM, Mihaly Zachar wrote:
> > If I do this:
> > applet:set_var('txn.myvar', 'myvar_value')
> >
> > Then in the HAProxy layer I can reach the variable with %[var(txn.myvar)]
> > So it DOES work !
> > But Is this safe ? Did I do it well or I was just lucky ?
>
> Actions expose 'txn', while services expose full 'applet' object, so I
> do think it works as intended, it's not an accident. You are using Lua
> service for redirection?
>

Hi Adis,

Ok, thanks for the confirmation.
Yes, I did build a small webservice using HAproxy + Lua.
Sometimes it does send back 200 Ok with some content, sometimes it sends
back 302 based on some logic, it depends on the request.
It controls device provisioning.

Thanks,
Misi


Re: [2.2.9] 100% CPU usage

2021-03-05 Thread Christopher Faulet

Le 05/03/2021 à 11:35, Maciej Zdeb a écrit :

Hi Christopher,

Thanks, I'll check but it'll take a couple days because the issue is quite rare. 
I'll return with feedback!


Maybe the patch is not backported to 2.2 because of commit message that states 
only 2.3 branch?




That's it. And it was finally backported in 2.2 and 2.1.

--
Christopher Faulet



Re: Logging down output from the a Lua script

2021-03-05 Thread Adis Nezirovic

On 3/4/21 9:47 PM, Mihaly Zachar wrote:

If I do this:
applet:set_var('txn.myvar', 'myvar_value')

Then in the HAProxy layer I can reach the variable with %[var(txn.myvar)]
So it DOES work !
But Is this safe ? Did I do it well or I was just lucky ?


Actions expose 'txn', while services expose full 'applet' object, so I 
do think it works as intended, it's not an accident. You are using Lua 
service for redirection?


Best regards,
--
Adis Nezirovic
Software Engineer
HAProxy Technologies - Powering your uptime!
375 Totten Pond Road, Suite 302 | Waltham, MA 02451, US
+1 (844) 222-4340 | https://www.haproxy.com



Re: [2.2.9] 100% CPU usage

2021-03-05 Thread Maciej Zdeb
Hi Christopher,

Thanks, I'll check but it'll take a couple days because the issue is quite
rare. I'll return with feedback!

Maybe the patch is not backported to 2.2 because of commit message that
states only 2.3 branch?

Kind regards,

czw., 4 mar 2021 o 22:34 Christopher Faulet 
napisał(a):

> Le 04/03/2021 à 14:01, Maciej Zdeb a écrit :
> > Hi,
> >
> > Sometimes after HAProxy reload it starts to loop infinitely, for example
> 9 of 10
> > threads using 100% CPU (gdb sessions attached). I've also dumped the
> core file
> > from gdb.
> >
> Hi Maciej,
>
> The 2.2.1O is out. But I'm afraid that a fix is missing. Could you test
> with the
> attached patch please ? On top of the 2.2.9 or 2.2.10, as you want.
>
> Thanks,
> --
> Christopher Faulet
>