January 29, 2019 4:24:57 AM CET Willy Tarreau <w...@1wt.eu> wrote:Hello Louis,

On Mon, Jan 28, 2019 at 10:43:37PM +0100, Louis Chanouha wrote:
> Hello,
> We faced this evening a critical issue this issue where all agent-checks
ware
> stuck (or retries very very slower than usual).
> In example I see "2h39m DOWN 6/15 ?" for more than 2h one several backend
> servers. So all down server stayed down until I manually forced "UP" state.
> The problem seemed to start from a (legitimate) L7 timeout and I can see now
> static 300% CPU usage.
> 
> We use Haproxy 1.9.2 compiled from Debian sources. We do not use any 1.9
> specific option (no htx) and never had this kind of bug before. We use
> threading (nbthread = 3) and a lot of L7 custom checks (tcp-check expect
> string ...).

I guess that if it's the first time you're seeing this, it might not happen
often enough to be easily debugged. However, what is the previous version
you've used where you feel reasonably confident you would have known if it
had happened ?

I'm pretty sure this bug is specific to version 1.9. Last week i restarted the 
process because is seemed to be stuck at around 100% CPU, but without anormal 
behaviour.
I've never saw that in 1.7 or 1.8 series. We migrated from 1.8.15 to 1.9.2.

For 3 years, i've never saw HAProxy use more than 30% CPU of our VM.

I suspect it might be related to some of the recent fixes on the checks
code which is unfortunately still shared with mailers and which caused
them to loop like crazy. Since there we've found two remaining bugs in
this area that were addressed after 1.9.2 and are already pending in the
maintenance branch, scheduled for release (hopefully today). One of them
(the issue with the task wake up) could possibly result in this as a side
effect.

> I did not restart our HAProxy to help debbuging. We can I provide to help ?
> Logfile isn't usefull.

If you want, you can take a core dump of the process using gdb. You attach
it to the process (gdb --pid $(pidof haproxy)) and issue "generate-core-file".
It will produce a core file that may be reused later with your executable
if we figure that we'd possibly need something from it. Please don't forget
to keep a copy of the executable with this core. This way you can safely
kill this process and restart it.

As i guess they could be private keys in theses file, i will send you core dump 
privately (master/worker) and or haproxy conf file. Hope it will help. 

> [Sorry for my english]

No problem at all with your english, at least from another frenchie :-)

:) 

Bonne journée,
Louis

Thanks,
Willy

--

Louis Chanouha | Missions SCOUT et CLOUD UFTMiP    
Service Numérique de l'Université de Toulouse
Université Fédérale Toulouse Midi-Pyrénées    
Maison de la Recherche et de la Valorisation - MRV
118 route de Narbonne - 31062 Toulouse Cedex 09
Tél. : +33 5 61 10 80 45 /    poste int. : 12 80 45    
louis.chano...@univ-toulouse.fr
Facebook |         Twitter | www.univ-toulouse.fr    

Reply via email to