January 29, 2019 4:24:57 AM CET Willy Tarreau <w...@1wt.eu> wrote:Hello Louis,
On Mon, Jan 28, 2019 at 10:43:37PM +0100, Louis Chanouha wrote: > Hello, > We faced this evening a critical issue this issue where all agent-checks ware > stuck (or retries very very slower than usual). > In example I see "2h39m DOWN 6/15 ?" for more than 2h one several backend > servers. So all down server stayed down until I manually forced "UP" state. > The problem seemed to start from a (legitimate) L7 timeout and I can see now > static 300% CPU usage. > > We use Haproxy 1.9.2 compiled from Debian sources. We do not use any 1.9 > specific option (no htx) and never had this kind of bug before. We use > threading (nbthread = 3) and a lot of L7 custom checks (tcp-check expect > string ...). I guess that if it's the first time you're seeing this, it might not happen often enough to be easily debugged. However, what is the previous version you've used where you feel reasonably confident you would have known if it had happened ? I'm pretty sure this bug is specific to version 1.9. Last week i restarted the process because is seemed to be stuck at around 100% CPU, but without anormal behaviour. I've never saw that in 1.7 or 1.8 series. We migrated from 1.8.15 to 1.9.2. For 3 years, i've never saw HAProxy use more than 30% CPU of our VM. I suspect it might be related to some of the recent fixes on the checks code which is unfortunately still shared with mailers and which caused them to loop like crazy. Since there we've found two remaining bugs in this area that were addressed after 1.9.2 and are already pending in the maintenance branch, scheduled for release (hopefully today). One of them (the issue with the task wake up) could possibly result in this as a side effect. > I did not restart our HAProxy to help debbuging. We can I provide to help ? > Logfile isn't usefull. If you want, you can take a core dump of the process using gdb. You attach it to the process (gdb --pid $(pidof haproxy)) and issue "generate-core-file". It will produce a core file that may be reused later with your executable if we figure that we'd possibly need something from it. Please don't forget to keep a copy of the executable with this core. This way you can safely kill this process and restart it. As i guess they could be private keys in theses file, i will send you core dump privately (master/worker) and or haproxy conf file. Hope it will help. > [Sorry for my english] No problem at all with your english, at least from another frenchie :-) :) Bonne journée, Louis Thanks, Willy -- Louis Chanouha | Missions SCOUT et CLOUD UFTMiP Service Numérique de l'Université de Toulouse Université Fédérale Toulouse Midi-Pyrénées Maison de la Recherche et de la Valorisation - MRV 118 route de Narbonne - 31062 Toulouse Cedex 09 Tél. : +33 5 61 10 80 45 / poste int. : 12 80 45 louis.chano...@univ-toulouse.fr Facebook | Twitter | www.univ-toulouse.fr