Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Lukas, On Sun, Nov 12, 2017 at 08:09:29PM +0100, Lukas Tribus wrote: > Hello, > > > 2017-11-10 23:43 GMT+01:00 PiBa-NL : > > Okay have been running with haproxy-ss-20171017 for a day now. Sofar it > > sticks to <1% cpu usage. > > FYI a similar report is on discourse, on linux without splicing > involved. Upgrading from 1.7.9 to haproxy-ss-20171017 appears to have > solved the problem: > > https://discourse.haproxy.org/t/haproxy-consuming-100-cpu/1758 Thanks for the feedback. We'll release 1.7.10 ASAP with the pending fixes. It's been waiting too long now. Cheers, Willy
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hello, 2017-11-10 23:43 GMT+01:00 PiBa-NL : > Okay have been running with haproxy-ss-20171017 for a day now. Sofar it > sticks to <1% cpu usage. FYI a similar report is on discourse, on linux without splicing involved. Upgrading from 1.7.9 to haproxy-ss-20171017 appears to have solved the problem: https://discourse.haproxy.org/t/haproxy-consuming-100-cpu/1758 Lukas
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Pieter, On Fri, Nov 10, 2017 at 11:43:41PM +0100, PiBa-NL wrote: > Okay have been running with haproxy-ss-20171017 for a day now. Sofar it > sticks to <1% cpu usage. > > Will report if anything changes, cant tell for sure as don't have a clear > reproduction of the issue, but my issue 'seems' fixed. Great! If anything changes and you want to report what is wrong, please don't forget to issue the following : - "show sess all" on the cli - attach gdb to the running process and issue a few series of "bt full", "cont", ctrl-c, "bt full", ... to try to catch the code path commonly called. Cheers, Willy
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Willy, Op 9-11-2017 om 5:45 schreef Willy Tarreau: Hi Pieter, We had something similar on Linux in relation with TCP splicing and the fd cache, for which a fix was emitted. But yesterday Christopher explained me that the fix has an impact on the way applets are scheduled in 1.8, so actually it could mean that the initial bug might possibly cover a larger scope than splicing only, and that recv+send might also be affected. If you're interested in testing, the commit in 1.7 is c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and is present in the latest snapshot (we really need to emit 1.7.10 BTW). I'd be curious to know if it fixes it or not. At least it will tell us if that's related to this fd cache thing or to something completely different such as Lua. I also need to check with Thierry if we could find a way to add some stats about the time spent in Lua to "show info" to help debugging such cases where Lua is involved. By the way, thanks for your dump, we'll check the sessions' statuses. There are not that many, and maybe it will give us a useful indication! Cheers, Willy Okay have been running with haproxy-ss-20171017 for a day now. Sofar it sticks to <1% cpu usage. Will report if anything changes, cant tell for sure as don't have a clear reproduction of the issue, but my issue 'seems' fixed. Regards, PiBa-NL / Pieter
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
On Fri, Nov 10, 2017 at 05:25:11AM +0100, Willy Tarreau wrote: > I just checked the contents and its OK. I think we have something wrong > on the script which generates the snapshots. The git version and the date > are normally updated on the fly by git when building the archive, thanks > to attributes. I'm not seeing the attribute file anymore in the repository > so it's possible we've lost it during the last server migration, I'll take > care of it. Now fixed, 1.4 to 1.7 were affected. Future snapshots should be OK starting from tomorrow. Willy
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Pieter, On Fri, Nov 10, 2017 at 12:18:30AM +0100, PiBa-NL wrote: > Hi Willy, List, > > Is it correct that when i build a haproxy-ss-20171017 snapshot that the > version still shows up as: > "HAProxy version 1.7.9, released 2017/08/18" > on both haproxy -vv and stats page.? > > Or did i do it wrong? I just checked the contents and its OK. I think we have something wrong on the script which generates the snapshots. The git version and the date are normally updated on the fly by git when building the archive, thanks to attributes. I'm not seeing the attribute file anymore in the repository so it's possible we've lost it during the last server migration, I'll take care of it. Thanks for the report. > p.s. I changed the Makefile like this: > PORTNAME= haproxy-ss > PORTVERSION= 20171017 > CATEGORIES= net www > MASTER_SITES= http://www.haproxy.org/download/1.7/src/snapshot/ > > And then ran: > make clean build install NO_CHECKSUM=yes > > Which did 'seem' to download the 'intended?' file.. I don't know these files above, they're probably related to your system, but in the snapshot file I'm clearly seeing that this is wrong : bash-4.2$ cd haproxy-ss-20171017/ bash-4.2$ cat SUBVERS -$Format:%h$ bash-4.2$ cat VERDATE $Format:%ci$ 2017/08/18 Normally these ones are changed on the fly by git-archive. Don't worry, the problem is on our side :-) Thanks for reporting this, Willy
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Willy, List, Is it correct that when i build a haproxy-ss-20171017 snapshot that the version still shows up as: "HAProxy version 1.7.9, released 2017/08/18" on both haproxy -vv and stats page.? Or did i do it wrong? p.s. I changed the Makefile like this: PORTNAME= haproxy-ss PORTVERSION= 20171017 CATEGORIES= net www MASTER_SITES= http://www.haproxy.org/download/1.7/src/snapshot/ And then ran: make clean build install NO_CHECKSUM=yes Which did 'seem' to download the 'intended?' file.. Thanks, PiBa-NL / Pieter
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Willy, Op 9-11-2017 om 5:45 schreef Willy Tarreau: Hi Pieter, On Thu, Nov 09, 2017 at 02:28:46AM +0100, PiBa-NL wrote: Actually haproxy has been running for a few weeks with 100% and i didnt notice.. it does keep working it seems.. Anyhow thought i would try and capture the next event if it would happen again. It did after a few hours.. After the truss output below the last line keeps repeating fast lots and lots of times. kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1 (0x1) recvfrom(1,0x8024ed972,16290,0,NULL,0x0) = 0 (0x0) kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) We had something similar on Linux in relation with TCP splicing and the fd cache, for which a fix was emitted. But yesterday Christopher explained me that the fix has an impact on the way applets are scheduled in 1.8, so actually it could mean that the initial bug might possibly cover a larger scope than splicing only, and that recv+send might also be affected. If you're interested in testing, the commit in 1.7 is c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and is present in the latest snapshot (we really need to emit 1.7.10 BTW). I'd be curious to know if it fixes it or not. At least it will tell us if that's related to this fd cache thing or to something completely different such as Lua. I also need to check with Thierry if we could find a way to add some stats about the time spent in Lua to "show info" to help debugging such cases where Lua is involved. By the way, thanks for your dump, we'll check the sessions' statuses. There are not that many, and maybe it will give us a useful indication! Cheers, Willy Thanks for your time, i didn't think the 'splice' problem mentioned on mailing-list would be relevant for my case so i didn't see a need to try latest snapshot. Couldn't find much other recent cpu issues there. But ill try and compile haproxy 1.7 latest snapshot or perhaps just 1.7.9 with this extra patch and see if it keeps running with low cpu usage for a few days.. I have not compiled haproxy for a while, ill see what works the easiest for me, assuming can make it work build on a separate FreeBSD machine and packaged/copied to the actual 'problem machine' that doesn't have compilation tools on it.. hopefully my build binary will be 'compatible'.. Will report back in a few day's.. Thanks, PiBa-NL / Pieter
Re: HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi Pieter, On Thu, Nov 09, 2017 at 02:28:46AM +0100, PiBa-NL wrote: > Actually haproxy has been running for a few weeks with 100% and i didnt > notice.. it does keep working it seems.. > > Anyhow thought i would try and capture the next event if it would happen > again. It did after a few hours.. > > After the truss output below the last line keeps repeating fast lots and > lots of times. > > kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) > kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1 > (0x1) > recvfrom(1,0x8024ed972,16290,0,NULL,0x0) = 0 (0x0) > kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) > kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) We had something similar on Linux in relation with TCP splicing and the fd cache, for which a fix was emitted. But yesterday Christopher explained me that the fix has an impact on the way applets are scheduled in 1.8, so actually it could mean that the initial bug might possibly cover a larger scope than splicing only, and that recv+send might also be affected. If you're interested in testing, the commit in 1.7 is c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and is present in the latest snapshot (we really need to emit 1.7.10 BTW). I'd be curious to know if it fixes it or not. At least it will tell us if that's related to this fd cache thing or to something completely different such as Lua. I also need to check with Thierry if we could find a way to add some stats about the time spent in Lua to "show info" to help debugging such cases where Lua is involved. By the way, thanks for your dump, we'll check the sessions' statuses. There are not that many, and maybe it will give us a useful indication! Cheers, Willy
HAProxy 1.7.9 FreeBSD 100% CPU usage
Hi List, I've experienced a issue where its using 100% cpu usage with haproxy 1.7.9 on FreeBSD 11.1p3 / pfSense 2.4.2dev. There is very little traffic actually hitting this haproxy instance. But it happened for the second time in a few days now. Actually haproxy has been running for a few weeks with 100% and i didnt notice.. it does keep working it seems.. Anyhow thought i would try and capture the next event if it would happen again. It did after a few hours.. After the truss output below the last line keeps repeating fast lots and lots of times. kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 1.0 }) = 0 (0x0) kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1 (0x1) recvfrom(1,0x8024ed972,16290,0,NULL,0x0) = 0 (0x0) kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) kevent(0,0x0,0,{ },7,{ 0.0 }) = 0 (0x0) I tried to gather all possible relevant info in attached file. Not using much special configuration options.. but i am using lua to server a small simple static response.. I'm not sure if its a problem that might be related to LUA, or perhaps there is some other issue.?. I've got tcpdump and complete truss output from before and while it happened after a few hours, but actually just a few request (+- 29).. But i would prefer to send these off list though, Willy if you desire i send em to your mail address? But maybe i have overlooked it on the mailinglist and its a known issue already..? Last connection which i think caused/triggered the issue is in screenshot(if it attaches right to the mail..)basically just a GET request which gets a ack, followed by a FYN,ACK packet from the client 30 seconds later again followed by a ack.. The LetsEncrypt backend that is part of the configuration never got a single request according to stats.. Is it a known issue? Are tcpdump/truss output desired ..? (where should i send em?) Is there any other output that can try to gather next time? Regards, PiBa-NL HA-Proxy version 1.7.9 2017/08/18 TARGET = freebsd [2.4.2-DEVELOPMENT][admin@pfsense.local]/root: /usr/local/pkg/haproxy/haproxy_socket.sh show sess all show sess all 0x80242b800: [08/Nov/2017:19:40:18.868158] id=15 proto=tcpv4 source=45.76.a.b:53752 flags=0x48a, conn_retries=0, srv_conn=0x0, pend_pos=0x0 frontend=www (id=3 mode=http), listener=37.97.x.y:80 (id=1) addr=37.97.x.y:80 backend= (id=-1 mode=-) server= (id=-1) task=0x80248f380 (state=0x04 nice=0 calls=4 exp= age=4h23m) txn=0x802421800 flags=0x820 meth=1 status=-1 req.st=MSG_BODY rsp.st=MSG_RPBEFORE waiting=0 si[0]=0x80242ba38 (state=EST flags=0x08 endp0=CONN:0x8024ca480 exp=, et=0x000) si[1]=0x80242ba60 (state=EST flags=0x4010 endp1=APPCTX:0x8024ca600 exp=, et=0x000) co0=0x8024ca480 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x8024ca300 flags=0x0025b300 fd=1 fd.state=22 fd.cache=0 updt=0 app1=0x8024ca600 st0=0 st1=0 st2=0 applet= req=0x80242b810 (f=0x80c020 an=0x0 pipe=0 tofwd=-1 total=94) an_exp= rex= wex= buf=0x8024ed900 data=0x8024ed914 o=94 p=94 req.next=94 i=0 size=16384 res=0x80242b850 (f=0x8040 an=0xa0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x783160 data=0x783174 o=0 p=0 rsp.next=0 i=0 size=0 0x80242ac00: [09/Nov/2017:00:04:24.403636] id=31 proto=unix_stream source=unix:1 flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0 frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1 backend= (id=-1 mode=-) server= (id=-1) task=0x80248f4d0 (state=0x0a nice=-64 calls=1 exp=10s age=?) si[0]=0x80242ae38 (state=EST flags=0x08 endp0=CONN:0x8024ca900 exp=, et=0x000) si[1]=0x80242ae60 (state=EST flags=0x4018 endp1=APPCTX:0x8024ca780 exp=, et=0x000) co0=0x8024ca900 ctrl=unix_stream xprt=RAW data=STRM target=LISTENER:0x8024ca000 flags=0x0020b306 fd=2 fd.state=25 fd.cache=0 updt=0 app1=0x8024ca780 st0=7 st1=0 st2=3 applet= req=0x80242ac10 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=15) an_exp= rex=10s wex= buf=0x8024e7dc0 data=0x8024e7dd4 o=0 p=0 req.next=0 i=0 size=16384 res=0x80242ac50 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1198) an_exp= rex= wex= buf=0x8025603c0 data=0x8025603d4 o=1198 p=1198 rsp.next=0 i=0 size=16384 FreeBSD pfsense.local 11.1-RELEASE-p3 FreeBSD 11.1-RELEASE-p3 #362 r313908+9cf44ec5484(RELENG_2_4): Fri Nov 3 08:23:14 CDT 2017 [2.4.2-DEVELOPMENT][admin@pfsense.local]/root: haproxy -vv HA-Proxy version 1.7.9 2017/08/18 Copyright 2000-2017 Willy Tarreau Build options : TARGET = freebsd CPU = generic CC = cc CFLAGS = -O2 -pipe -fstack-protector -fno-s