Re: Haproxy running on 100% CPU and slow downloads
To close this thread out: we found the issue to be in 1.6.4-20160426 patch that I was using. The issue is fixed in 1.6.5. Thanks Willy and Lukas. Thanks Sachin On 5/13/16, 8:14 PM, "Willy Tarreau" wrote: >On Fri, May 13, 2016 at 07:32:36PM +0530, Sachin Shetty wrote: >> In 24 hours all servers had connections growing, we have reverted the >> patch for now. >> >> I have the show sess all output if you would like to see. > >Interestingly in the "show sess all" from yesterday I'm seeing only >negative "tofwd" values for stuck sessions. Exactly the type of thing >which is supposedly fixed now (it's the problem with 2-4GB transfers). >I don't understand since I tested the backport and had the confirmation >from another user that it was OK for him. Maybe there's a corner case I >haven't figure which may depend on certain options. > >Could you please send me privately your config (remove the confidential >stuff) ? I think you gave it to me a few times already but I don't want >to keep those you know. > >Thanks, >Willy >
Re: Haproxy running on 100% CPU and slow downloads
On Fri, May 13, 2016 at 07:32:36PM +0530, Sachin Shetty wrote: > In 24 hours all servers had connections growing, we have reverted the > patch for now. > > I have the show sess all output if you would like to see. Interestingly in the "show sess all" from yesterday I'm seeing only negative "tofwd" values for stuck sessions. Exactly the type of thing which is supposedly fixed now (it's the problem with 2-4GB transfers). I don't understand since I tested the backport and had the confirmation from another user that it was OK for him. Maybe there's a corner case I haven't figure which may depend on certain options. Could you please send me privately your config (remove the confidential stuff) ? I think you gave it to me a few times already but I don't want to keep those you know. Thanks, Willy
Re: Haproxy running on 100% CPU and slow downloads
Hi Sachin, On Fri, May 13, 2016 at 07:32:36PM +0530, Sachin Shetty wrote: > In 24 hours all servers had connections growing, we have reverted the > patch for now. > > I have the show sess all output if you would like to see. Thank you very much, that's extremely useful. I'll probably get back to you in the next few days if I find that I need more information. Indeed, do not take risks on your production, our development model makes it easy for you to limit the risks by switching back, so stay safe! Best regards, Willy
Re: Haproxy running on 100% CPU and slow downloads
In 24 hours all servers had connections growing, we have reverted the patch for now. I have the show sess all output if you would like to see. Thanks Sachin On 5/12/16, 10:08 PM, "Sachin Shetty" wrote: >Hi Lukas, > >Attached output. > >Thanks >Sachin > >On 5/12/16, 7:41 PM, "Lukas Tribus" wrote: > >>Hi, >> >> >>Am 12.05.2016 um 14:37 schrieb Sachin Shetty: >>> Hi Willy, >>> >>> We are seeing a strange problem on the patched server. We have several >>> haproxy servers running but only one with the latest patch, and this >>> haproxy has frozen twice in last two days, basically it hits max open >>> connections 2000 on frontend and then stalls. From the logs it has 1999 >>> connections on one of the backends which is nginx, but nginx_status >>>shows >>> me only a few active connections. It only happens on the patched >>>haproxy >>> server and does not happen anywhere else. Interesting thing is this >>> haproxy is not the one doing SSL, we have two haproxies on the same box >>> with the latest binary, the SSL one seems ok but the non SSL one keeps >>>on >>> accumulating connections. >>> >>> Right now, I see connections building on one backend hitting 150 in the >>> last few hours, but the backend nginx only shows about 20 active >>> connections. >> >>Can you collect "show sess all" output from the admin socket? >> >>Lukas
Re: Haproxy running on 100% CPU and slow downloads
Hi, Am 12.05.2016 um 14:37 schrieb Sachin Shetty: Hi Willy, We are seeing a strange problem on the patched server. We have several haproxy servers running but only one with the latest patch, and this haproxy has frozen twice in last two days, basically it hits max open connections 2000 on frontend and then stalls. From the logs it has 1999 connections on one of the backends which is nginx, but nginx_status shows me only a few active connections. It only happens on the patched haproxy server and does not happen anywhere else. Interesting thing is this haproxy is not the one doing SSL, we have two haproxies on the same box with the latest binary, the SSL one seems ok but the non SSL one keeps on accumulating connections. Right now, I see connections building on one backend hitting 150 in the last few hours, but the backend nginx only shows about 20 active connections. Can you collect "show sess all" output from the admin socket? Lukas
Re: Haproxy running on 100% CPU and slow downloads
Hi Willy, We are seeing a strange problem on the patched server. We have several haproxy servers running but only one with the latest patch, and this haproxy has frozen twice in last two days, basically it hits max open connections 2000 on frontend and then stalls. From the logs it has 1999 connections on one of the backends which is nginx, but nginx_status shows me only a few active connections. It only happens on the patched haproxy server and does not happen anywhere else. Interesting thing is this haproxy is not the one doing SSL, we have two haproxies on the same box with the latest binary, the SSL one seems ok but the non SSL one keeps on accumulating connections. Right now, I see connections building on one backend hitting 150 in the last few hours, but the backend nginx only shows about 20 active connections. On 5/10/16, 5:47 PM, "Willy Tarreau" wrote: >On Tue, May 10, 2016 at 11:10:14AM +0530, Sachin Shetty wrote: >> We deployed the latest and we saw throughput still dropped around peak >> hours a bit, then we swithed to nbproc 4 which is holding up ok. > >So probably you were reaching the processing limits for a single process, >that can easily happen with SSL if a lot of rekeying has to be done. > >> Note that >> 4 Cpus was not sufficient earlier, so I believe the latest version is >> scaling better. > >Good, that confirms that you're not facing these bugs anymore. I'm >currently >starting a new release, that will make it easier for you to deploy. > >Thanks for the report, >Willy >
Re: Haproxy running on 100% CPU and slow downloads
On Tue, May 10, 2016 at 11:10:14AM +0530, Sachin Shetty wrote: > We deployed the latest and we saw throughput still dropped around peak > hours a bit, then we swithed to nbproc 4 which is holding up ok. So probably you were reaching the processing limits for a single process, that can easily happen with SSL if a lot of rekeying has to be done. > Note that > 4 Cpus was not sufficient earlier, so I believe the latest version is > scaling better. Good, that confirms that you're not facing these bugs anymore. I'm currently starting a new release, that will make it easier for you to deploy. Thanks for the report, Willy
Re: Haproxy running on 100% CPU and slow downloads
We deployed the latest and we saw throughput still dropped around peak hours a bit, then we swithed to nbproc 4 which is holding up ok. Note that 4 Cpus was not sufficient earlier, so I believe the latest version is scaling better. Thanks Lukas and Willy. On 4/29/16, 11:09 AM, "Willy Tarreau" wrote: >Hi guys, > >On Tue, Apr 26, 2016 at 08:46:37AM +0200, Lukas Tribus wrote: >> Hi Sachin, >> >> >> there is another fix Willy recently committed, its ff9c7e24fb [1] >> and its in the snapshots [2] since 1.6.4-20160426. >> >> This is supposed to fix the issue altogether. >> >> Please let us know if this works for you. > >Yes it should fix this. Please note that I've got one report in 1.5 of >some huge transfers (multi-GB) stalling after this patch, and since I >can't find any case where it could be wrong nor can I reproduce it, I >suspect we may have a bug somewhere else (at least in 1.5) that was >hidden by the bug this series of patches fix. We had no such report on >1.6 however. > >There's another case of high CPU usage which Cyril managed to isolate. >The issue has been present since 1.4 and is *very* hard to reproduce, >I even had to tweek some sysctls on my laptop to see it and am careful >not to reboot it. It is triggered by *some* pipelined requests. We're >currently working on fixing it, there are several ways to fix it but >all of them come with their downsides for now (one of them being a >different code path between 1.7 and 1.6/1.5/1.4 which doesn't appeal >me much). > >This is why I'm still waiting before issuing a new series of versions. > >In the mean time, feel free to test latest 1.6 snapshot and report any >issues you may face. I've really committed into getting these issues >fixed once for all, it's getting irritating to see such bugs surviving >but I never give up the fight :-) > >Best regards, >Willy >
Re: Haproxy running on 100% CPU and slow downloads
Thanks Lukas and Willy. I am in the process of getting 1.6.4-20160426 deployed in our QA, I will keep you guys posted. On 4/29/16, 11:09 AM, "Willy Tarreau" wrote: >Hi guys, > >On Tue, Apr 26, 2016 at 08:46:37AM +0200, Lukas Tribus wrote: >> Hi Sachin, >> >> >> there is another fix Willy recently committed, its ff9c7e24fb [1] >> and its in the snapshots [2] since 1.6.4-20160426. >> >> This is supposed to fix the issue altogether. >> >> Please let us know if this works for you. > >Yes it should fix this. Please note that I've got one report in 1.5 of >some huge transfers (multi-GB) stalling after this patch, and since I >can't find any case where it could be wrong nor can I reproduce it, I >suspect we may have a bug somewhere else (at least in 1.5) that was >hidden by the bug this series of patches fix. We had no such report on >1.6 however. > >There's another case of high CPU usage which Cyril managed to isolate. >The issue has been present since 1.4 and is *very* hard to reproduce, >I even had to tweek some sysctls on my laptop to see it and am careful >not to reboot it. It is triggered by *some* pipelined requests. We're >currently working on fixing it, there are several ways to fix it but >all of them come with their downsides for now (one of them being a >different code path between 1.7 and 1.6/1.5/1.4 which doesn't appeal >me much). > >This is why I'm still waiting before issuing a new series of versions. > >In the mean time, feel free to test latest 1.6 snapshot and report any >issues you may face. I've really committed into getting these issues >fixed once for all, it's getting irritating to see such bugs surviving >but I never give up the fight :-) > >Best regards, >Willy >
Re: Haproxy running on 100% CPU and slow downloads
Hi guys, On Tue, Apr 26, 2016 at 08:46:37AM +0200, Lukas Tribus wrote: > Hi Sachin, > > > there is another fix Willy recently committed, its ff9c7e24fb [1] > and its in the snapshots [2] since 1.6.4-20160426. > > This is supposed to fix the issue altogether. > > Please let us know if this works for you. Yes it should fix this. Please note that I've got one report in 1.5 of some huge transfers (multi-GB) stalling after this patch, and since I can't find any case where it could be wrong nor can I reproduce it, I suspect we may have a bug somewhere else (at least in 1.5) that was hidden by the bug this series of patches fix. We had no such report on 1.6 however. There's another case of high CPU usage which Cyril managed to isolate. The issue has been present since 1.4 and is *very* hard to reproduce, I even had to tweek some sysctls on my laptop to see it and am careful not to reboot it. It is triggered by *some* pipelined requests. We're currently working on fixing it, there are several ways to fix it but all of them come with their downsides for now (one of them being a different code path between 1.7 and 1.6/1.5/1.4 which doesn't appeal me much). This is why I'm still waiting before issuing a new series of versions. In the mean time, feel free to test latest 1.6 snapshot and report any issues you may face. I've really committed into getting these issues fixed once for all, it's getting irritating to see such bugs surviving but I never give up the fight :-) Best regards, Willy
Re: Haproxy running on 100% CPU and slow downloads
Hi Sachin, there is another fix Willy recently committed, its ff9c7e24fb [1] and its in the snapshots [2] since 1.6.4-20160426. This is supposed to fix the issue altogether. Please let us know if this works for you. Thanks, Lukas [1] http://www.haproxy.org/git?p=haproxy-1.6.git;a=commitdiff_plain;h=ff9c7e24fbbc33074e5257297e38473a3411f407 [2] http://www.haproxy.org/download/1.6/src/snapshot/
Re: Haproxy running on 100% CPU and slow downloads
Hi Lukas, We tried the patch, it seems better. As soon as we switched nbproc off, throughput did not drop immediately like it did with earlier version, it started deteriorating slowly as traffic increased to peak hours, but eventually it did crash to the same levels as before. CPU Usage was also better, only at peak hours I saw 100% CPU consumed by haproxy, other wise it would be between 60-80%. Please see attached image measuring througput, nbproc=20 until ~10PM, nbroc=1 from ~10PM to ~10AM, nbproc reverted to 20 from 10 AM onwards. Y-axis is speed in MBPS. Thanks Sachin On 4/21/16, 12:57 PM, "Lukas Tribus" wrote: >Hi, > > >Am 21.04.2016 um 08:11 schrieb Sachin Shetty: >> Hi, >> >> any hints to further isolate this - we have deferred the problem by >>adding >> all the cores we had, but I have a feeling that our request rate is not >> that high (7K per minute a peak) and it will show up again as traffic >> increases. >> >> Thanks >> Sachin >> > >Try the fix 9c09ee87 [1], which is in snapshots since 1.6.4-20160412. > > >cheers, > >lukas > >[1] >http://www.haproxy.org/git?p=haproxy-1.6.git;a=commitdiff_plain;h=9c09ee87 >836bb2efd78a17f9b16d8afe0ec64018;hp=3bee40bfb7a35b624c5cc9d88daff5a9e3b99f >33 >[2] http://www.haproxy.org/download/1.6/src/snapshot/
Re: Haproxy running on 100% CPU and slow downloads
Hi, Am 21.04.2016 um 08:11 schrieb Sachin Shetty: Hi, any hints to further isolate this - we have deferred the problem by adding all the cores we had, but I have a feeling that our request rate is not that high (7K per minute a peak) and it will show up again as traffic increases. Thanks Sachin Try the fix 9c09ee87 [1], which is in snapshots since 1.6.4-20160412. cheers, lukas [1] http://www.haproxy.org/git?p=haproxy-1.6.git;a=commitdiff_plain;h=9c09ee87836bb2efd78a17f9b16d8afe0ec64018;hp=3bee40bfb7a35b624c5cc9d88daff5a9e3b99f33 [2] http://www.haproxy.org/download/1.6/src/snapshot/
Re: Haproxy running on 100% CPU and slow downloads
Hi, any hints to further isolate this - we have deferred the problem by adding all the cores we had, but I have a feeling that our request rate is not that high (7K per minute a peak) and it will show up again as traffic increases. Thanks Sachin On 4/18/16, 12:22 PM, "Sachin Shetty" wrote: >Hi Lukas, > >We upgraded to 1.6, went back to nbproc 1 from 12 and the problem showed >up again. Haproxy hitting 90-100% and monitors reported download speed >drop from 100MBPS to 10MBPS immediately. > >I ran strace as you said, output it huge, have attached a small subset of >it in the email. Please let me know if you need more of strace output. > >Thanks >Sachin > > > >On 4/7/16, 5:51 PM, "Lukas Tribus" wrote: > >>Hi, >> >>Am 05.04.2016 um 09:38 schrieb Sachin Shetty: >>> Hi Lukas, Pavlos, >>> >>> Thanks for your response, more info as requested. >>> >>> 1. Attached conf with some obfuscation >>> 2. Haproxy -vv >>> HA-Proxy version 1.5.4 2014/09/02 >>> Copyright 2000-2014 Willy Tarreau >>> >> >>I would upgrade to something more recent, the number of bugfixes >>since 1.5.4 amount to more than 100! >> >>That said, I've not stumbled upon a particular bug explaining what >>you are seeing. >> >>My suggestion would be to go back to nbproc 1 (its easier to >>troubleshoot), and run the 100% spinning process through >>strace -tt -p and post the output. >> >> >> >> >>Thanks, >> >>Lukas
Re: Haproxy running on 100% CPU and slow downloads
Hi Lukas, We upgraded to 1.6, went back to nbproc 1 from 12 and the problem showed up again. Haproxy hitting 90-100% and monitors reported download speed drop from 100MBPS to 10MBPS immediately. I ran strace as you said, output it huge, have attached a small subset of it in the email. Please let me know if you need more of strace output. Thanks Sachin On 4/7/16, 5:51 PM, "Lukas Tribus" wrote: >Hi, > >Am 05.04.2016 um 09:38 schrieb Sachin Shetty: >> Hi Lukas, Pavlos, >> >> Thanks for your response, more info as requested. >> >> 1. Attached conf with some obfuscation >> 2. Haproxy -vv >> HA-Proxy version 1.5.4 2014/09/02 >> Copyright 2000-2014 Willy Tarreau >> > >I would upgrade to something more recent, the number of bugfixes >since 1.5.4 amount to more than 100! > >That said, I've not stumbled upon a particular bug explaining what >you are seeing. > >My suggestion would be to go back to nbproc 1 (its easier to >troubleshoot), and run the 100% spinning process through >strace -tt -p and post the output. > > > > >Thanks, > >Lukas 23:30:41.257757 sendto(120, "...", 16384, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16384 23:30:41.258001 sendto(87, "...", 919, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 919 23:30:41.258077 read(33, "\27\3\3\0020", 5) = 5 23:30:41.258134 read(33, "...", 560) = 560 23:30:41.258201 read(3, "\26\3\3\0F", 5) = 5 23:30:41.258244 read(3, "...", 70) = 70 23:30:41.259294 read(3, "\24\3\3\0\1", 5) = 5 23:30:41.259347 read(3, "\1", 1)= 1 23:30:41.259514 read(3, "\26\3\3\0@", 5) = 5 23:30:41.259559 read(3, "...", 64) = 64 23:30:41.259668 write(3, "...", 75) = 75 23:30:41.259748 read(3, 0x7feeaed21343, 5) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.259818 read(71, "\26\3\1\2\6", 5) = 5 23:30:41.259863 read(71, "...", 518) = 518 23:30:41.280711 read(71, "\24\3\1\0\1", 5) = 5 23:30:41.280790 read(71, "\1", 1) = 1 23:30:41.280967 read(71, "\26\3\1\", 5) = 5 23:30:41.281012 read(71, "...", 48) = 48 23:30:41.281121 write(71, "...", 59) = 59 23:30:41.281199 read(71, 0x7feeaed21343, 5) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.281246 read(51, "...", 14977) = 14977 23:30:41.281405 sendto(56, "...", 16384, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16384 23:30:41.281472 read(38, 0x7feeaeb15183, 5) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.281517 read(140, "...", 7677) = 5840 23:30:41.281562 read(140, 0x7feeaec87a2b, 1837) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.281605 read(45, "\27\3\3\2\240", 5) = 5 23:30:41.281647 read(45, "...", 672) = 672 23:30:41.281699 read(31, "...", 48) = 48 23:30:41.281811 sendto(272, "...", 16384, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16384 23:30:41.281948 write(167, "...", 15525) = 15525 23:30:41.282025 read(72, "...", 15923) = 8184 23:30:41.282076 read(72, "...", 7739) = 1364 23:30:41.282119 read(72, 0x7feeaebf89c1, 6375) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.282162 read(24, "...", 1837) = 1837 23:30:41.282278 sendto(107, "...", 16384, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16384 23:30:41.282328 recvfrom(41, "...", 16384, 0, NULL, NULL) = 16384 23:30:41.282382 recvfrom(81, "...", 15360, 0, NULL, NULL) = 214 23:30:41.282438 write(21, "...", 389) = 389 23:30:41.282497 write(25, "...", 389) = 389 23:30:41.282563 write(25, "...", 53) = 53 23:30:41.282613 shutdown(25, SHUT_WR) = 0 23:30:41.282660 read(18, 0x7feeae813be3, 5) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.282704 sendto(92, "...", 818, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 818 23:30:41.282753 read(39, 0x7feeae813be3, 5) = -1 EAGAIN (Resource temporarily unavailable) 23:30:41.282796 sendto(88, "...", 2062, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 2062 23:30:41.282944 getsockname(33, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("Some-IP")}, [16]) = 0 23:30:41.283008 getsockopt(33, SOL_IP, 0x50 /* IP_??? */, "\2\0\1\273\n\31\220\17\0\0\0\0\0\0\0\0", [16]) = 0 23:30:41.283082 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 77 23:30:41.283132 fcntl(77, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 23:30:41.283188 setsockopt(77, SOL_TCP, TCP_NODELAY, [1], 4) = 0 23:30:41.283233 connect(77, {sa_family=AF_INET, sin_port=htons(7300), sin_addr=inet_addr("Some-IP")}, 16) = -1 EINPROGRESS (Operation now in progress) 23:30:41.283415 getsockname(45, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("Some-IP")}, [16]) = 0 23:30:41.283467 getsockopt(45, SOL_IP, 0x50 /* IP_??? */, "\2\0\1\273\n\31\220\17\0\0\0\0\0\0\0\0", [16]) = 0 23:30:41.283521 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 79 23:30:41.283565 fcntl(79, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 23:30:41.283605 setsockopt(79, SOL_TCP, TCP_NODELAY, [1], 4) = 0 23:30:41.283647 connect(79, {sa_family=AF_INET, sin_port=htons(9930), sin_addr=inet_addr("Some-IP")}, 16) = -1 EINPROGRESS (Operation now in progress) 23:30:41.283723 setsockopt(81, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 23:30:41.283772 close(81)
Re: Haproxy running on 100% CPU and slow downloads
agree to both the points. Thanks Sachin On 4/7/16, 11:24 PM, "Willy Tarreau" wrote: >On Thu, Apr 07, 2016 at 10:59:24PM +0530, Sachin Shetty wrote: >> Hi Willy, >> >> Sorry for the confusion. I wrote to you much before in my >>investigation. I >> will take care going forward. > >OK but in general the point remains, and it's not just for you but for >everyone in general, the mailing list is here to reach around 1000 persons >at once, so once your message is posted, you have to keep in mind that >several of them will start to think about your problem even if they don't >respond, which is why it is very important to be transparent about any >progress made on parallel investigation or parallel contacts. Just like >when you ask something to two distinct coworkers, one gives you a fast >response, the other ones comes the next day and says "I've setup a lab >yesterday to check what you asked me and I found this last night". You'll >feel bad telling him "Oh I already got the response, thank you anyway". > >> Only now I realized that I messed up the version numbers because it >>seems >> we have different versions in our cluster. > >OK similarly there's nothing wrong telling errors in bug reports, we all >do this because we test lots of stuff and we end up confusing things. But >once you notice something was wrong, simply respond again and fix the >information. Reliable versions helps eliminate candidate patches and also >help people joining saying "same problem here". > >> We are now testing with 1.6.4 and trying to fast track it. > >OK thanks for the feedback! > >Willy >
Re: Haproxy running on 100% CPU and slow downloads
On Thu, Apr 07, 2016 at 10:59:24PM +0530, Sachin Shetty wrote: > Hi Willy, > > Sorry for the confusion. I wrote to you much before in my investigation. I > will take care going forward. OK but in general the point remains, and it's not just for you but for everyone in general, the mailing list is here to reach around 1000 persons at once, so once your message is posted, you have to keep in mind that several of them will start to think about your problem even if they don't respond, which is why it is very important to be transparent about any progress made on parallel investigation or parallel contacts. Just like when you ask something to two distinct coworkers, one gives you a fast response, the other ones comes the next day and says "I've setup a lab yesterday to check what you asked me and I found this last night". You'll feel bad telling him "Oh I already got the response, thank you anyway". > Only now I realized that I messed up the version numbers because it seems > we have different versions in our cluster. OK similarly there's nothing wrong telling errors in bug reports, we all do this because we test lots of stuff and we end up confusing things. But once you notice something was wrong, simply respond again and fix the information. Reliable versions helps eliminate candidate patches and also help people joining saying "same problem here". > We are now testing with 1.6.4 and trying to fast track it. OK thanks for the feedback! Willy
Re: Haproxy running on 100% CPU and slow downloads
Hi Willy, Sorry for the confusion. I wrote to you much before in my investigation. I will take care going forward. Only now I realized that I messed up the version numbers because it seems we have different versions in our cluster. We are now testing with 1.6.4 and trying to fast track it. Thanks Sachin On 4/7/16, 6:31 PM, "Willy Tarreau" wrote: >Hi Sachin, > >On Thu, Apr 07, 2016 at 02:21:16PM +0200, Lukas Tribus wrote: >> Hi, >> >> Am 05.04.2016 um 09:38 schrieb Sachin Shetty: >> >Hi Lukas, Pavlos, >> > >> >Thanks for your response, more info as requested. >> > >> >1. Attached conf with some obfuscation >> >2. Haproxy -vv >> >HA-Proxy version 1.5.4 2014/09/02 >> >Copyright 2000-2014 Willy Tarreau >> > >> >> I would upgrade to something more recent, the number of bugfixes >> since 1.5.4 amount to more than 100! >(...) > >I'm just discovering that you opened this thread twice in parallel, >once with me in private and once with the ML, resulting in everyone >doing the work twice and giving you the same advices twice. Please >avoid this in the future, it wastes everyone's time and discourages >people from responding to such questions. The place to ask is the ML, >and if you contact someone privately please at least point to the >public question so that the response is public and it saves others' >valuable time. > >Also the version you reported to me was different : > > HA-Proxy version 1.5.9 2014/11/25 > >Thanks, >Willy >
Re: Haproxy running on 100% CPU and slow downloads
Hi Sachin, On Thu, Apr 07, 2016 at 02:21:16PM +0200, Lukas Tribus wrote: > Hi, > > Am 05.04.2016 um 09:38 schrieb Sachin Shetty: > >Hi Lukas, Pavlos, > > > >Thanks for your response, more info as requested. > > > >1. Attached conf with some obfuscation > >2. Haproxy -vv > >HA-Proxy version 1.5.4 2014/09/02 > >Copyright 2000-2014 Willy Tarreau > > > > I would upgrade to something more recent, the number of bugfixes > since 1.5.4 amount to more than 100! (...) I'm just discovering that you opened this thread twice in parallel, once with me in private and once with the ML, resulting in everyone doing the work twice and giving you the same advices twice. Please avoid this in the future, it wastes everyone's time and discourages people from responding to such questions. The place to ask is the ML, and if you contact someone privately please at least point to the public question so that the response is public and it saves others' valuable time. Also the version you reported to me was different : HA-Proxy version 1.5.9 2014/11/25 Thanks, Willy
Re: Haproxy running on 100% CPU and slow downloads
Hi, Am 05.04.2016 um 09:38 schrieb Sachin Shetty: Hi Lukas, Pavlos, Thanks for your response, more info as requested. 1. Attached conf with some obfuscation 2. Haproxy -vv HA-Proxy version 1.5.4 2014/09/02 Copyright 2000-2014 Willy Tarreau I would upgrade to something more recent, the number of bugfixes since 1.5.4 amount to more than 100! That said, I've not stumbled upon a particular bug explaining what you are seeing. My suggestion would be to go back to nbproc 1 (its easier to troubleshoot), and run the 100% spinning process through strace -tt -p and post the output. Thanks, Lukas
Re: Haproxy running on 100% CPU and slow downloads
Hi Lukas, Pavlos, Thanks for your response, more info as requested. 1. Attached conf with some obfuscation 2. Haproxy -vv HA-Proxy version 1.5.4 2014/09/02 Copyright 2000-2014 Willy Tarreau Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -DTCP_USER_TIMEOUT=18 OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.7 Compression algorithms supported : identity, deflate, gzip Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.32 2012-11-30 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. 3. uname -a Linux avl-www10.dc.egnyte.lan 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [sshetty@avl-www10 haproxy_l1_sync]$ 4. rfc5077-client seems ok [✔] Prepare tests. [✔] Run tests without use of tickets. [✔] Display result set: │ IP address│ Try │ Cipher│ Reuse │SSL Session ID │ Master key │ Ticket │ Answer │ ───┼─┼───┼───┼─ ┼─┼┼─── │ 208.83.105.14 │ 0 │ ECDHE-RSA-AES256-SHA │ ✘ │ 40A2D3E903C2457551… │ B4A08BB73457356AA2… │ ✘│ HTTP/1.1 200 OK │ 208.83.105.14 │ 1 │ ECDHE-RSA-AES256-SHA │ ✔ │ 40A2D3E903C2457551… │ B4A08BB73457356AA2… │ ✘│ HTTP/1.1 200 OK │ 208.83.105.14 │ 2 │ ECDHE-RSA-AES256-SHA │ ✔ │ 40A2D3E903C2457551… │ B4A08BB73457356AA2… │ ✘│ HTTP/1.1 200 OK │ 208.83.105.14 │ 3 │ ECDHE-RSA-AES256-SHA │ ✔ │ 40A2D3E903C2457551… │ B4A08BB73457356AA2… │ ✘│ HTTP/1.1 200 OK │ 208.83.105.14 │ 4 │ ECDHE-RSA-AES256-SHA │ ✔ │ 40A2D3E903C2457551… │ B4A08BB73457356AA2… │ ✘│ HTTP/1.1 200 OK [✔] Dump results to file. [✔] Run tests with use of tickets. [✔] Display result set: │ IP address│ Try │ Cipher│ Reuse │SSL Session ID │ Master key │ Ticket │ Answer │ ───┼─┼───┼───┼─ ┼─┼┼─── │ 208.83.105.14 │ 0 │ ECDHE-RSA-AES256-SHA │ ✘ │ E4559330FD100E69F5… │ 05F768F5574FD27E88… │ ✔│ HTTP/1.1 200 OK │ 208.83.105.14 │ 1 │ ECDHE-RSA-AES256-SHA │ ✔ │ E4559330FD100E69F5… │ 05F768F5574FD27E88… │ ✔│ HTTP/1.1 200 OK │ 208.83.105.14 │ 2 │ ECDHE-RSA-AES256-SHA │ ✔ │ E4559330FD100E69F5… │ 05F768F5574FD27E88… │ ✔│ HTTP/1.1 200 OK │ 208.83.105.14 │ 3 │ ECDHE-RSA-AES256-SHA │ ✔ │ E4559330FD100E69F5… │ 05F768F5574FD27E88… │ ✔│ HTTP/1.1 200 OK │ 208.83.105.14 │ 4 │ ECDHE-RSA-AES256-SHA │ ✔ │ E4559330FD100E69F5… │ 05F768F5574FD27E88… │ ✔│ HTTP/1.1 200 OK [✔] Dump results to file. On 4/5/16, 12:14 AM, "Lukas Tribus" wrote: >Hi Sachin, > > >(due to email troubles on my side this may look like a new thread, sorry >about that) > > > > We have quite a few regex and acls in our config, is there a way to >profile > > haproxy and see what could be slowing it down? > >You can use strace for syscalls or ltrace for library calls to see if >something >in particular shows up, but perf may be the better tool for this job (I >never >used it though). > > >Like Pavlos said, lets collect some basic informations first: > >- haproxy -vv output >- uname -a >- configuration (replace proprietary informations but leave everything >else intact) >- does TLS resumption correctly work? Check with rfc5077-client: > >git clone https://github.com/vincentbernat/rfc5077.git >cd rfc5077 >make rfc5077-client > > >./rfc5077-client > > > >There's a chance that it is SSL/TLS related. > > > >Regards, > >Lukas > haproxy.sync.conf Description: Binary data
Re: Haproxy running on 100% CPU and slow downloads
Hi Sachin, (due to email troubles on my side this may look like a new thread, sorry about that) > We have quite a few regex and acls in our config, is there a way to profile > haproxy and see what could be slowing it down? You can use strace for syscalls or ltrace for library calls to see if something in particular shows up, but perf may be the better tool for this job (I never used it though). Like Pavlos said, lets collect some basic informations first: - haproxy -vv output - uname -a - configuration (replace proprietary informations but leave everything else intact) - does TLS resumption correctly work? Check with rfc5077-client: git clone https://github.com/vincentbernat/rfc5077.git cd rfc5077 make rfc5077-client ./rfc5077-client There's a chance that it is SSL/TLS related. Regards, Lukas
Re: Haproxy running on 100% CPU and slow downloads
On 04/04/2016 05:23 μμ, Sachin Shetty wrote: > Hi, > > I am chasing some weird capacity issues in our setup. > > Haproxy which also does SSL is forwarding request to various other > servers upstream. I am seeing a simple 100MB file download from our > upstream components starts to slow down time to time like hitting as low > as 1MBPS, usually is it greater than 100MBPS. When this happens, I tried > downloading the file from the upstream component bypassing haproxy from > the same box, and that is fast enough – 100MBPS. So it seems like > haproxy is getting jammed on something. Did you use HTTPS on the server as well? > > The only suspicious thing I see is that haproxy will be spinning on 100% > CPU. So we added nbproc 4 and I still see the same pattern, when the > speed drops, all haproxy proceses are hitting 80-100%. The request rate > when the speed drops is about 5K/minute which is only 2X of requests > when things are normal and download speeds are fine. what is user and sys level of CPU? > > We have quite a few regex and acls in our config, is there a way to > profile haproxy and see what could be slowing it down? > You better include the actual config, it will increase the level of support that you may get. Cheers, Pavlos signature.asc Description: OpenPGP digital signature