Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
The problem with nginx is that it doesnt support chunked-encoding. Since that is what we are after, we can't use nginx until it supports it or until we can get rid of chunked encoding. So posting about how good it is working for you is not really helping our issue. Thanks though. BR, Timh 2010/3/19 duncan hall dun...@viator.com Throw me in a forth on this one. I use nginx 0.8.34 for gzip compression, RAM caching of static content and SSL offload. All very simple to configure and low overheads. All requests HTTP and HTTPs go to Nginx and are then forwarded to HAproxy 1.4.2 as HTTP. Regards, Duncan Harvey Yau wrote: I can third this - nginx + haproxy works extremely well. I wish haproxy supported SSL directly. I realize it's not within the design goals of haproxy, but the need for this is out there. Good thing nginx + haproxy works well enough. -- Harvey On 3/18/10 3:29 PM, Nicholas Hadaway wrote: I can second this comment and say that it works extremely well... nginx operates very nicely as an SSL offloading device. I am right now using nginx 0.8.33 (soon to bump up to 0.8.34) and HAProxy 1.4.2 in production and things work very well for me. -Nick Maybe it's worth a try for you to get along with nginx as stunnel replacement ? Its performance is quit good and the config can be held very short, too for only accepting ssl traffic and directing it to haproxy. kind regards, Malte -- Timh Bergström System Operations Manager Diino AB - www.diino.com :wq
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Chunked encoding support... http://github.com/agentzh/chunkin-nginx-module -nick On 3/22/2010 4:57 AM, Timh Bergström wrote: The problem with nginx is that it doesnt support chunked-encoding. Since that is what we are after, we can't use nginx until it supports it or until we can get rid of chunked encoding. So posting about how good it is working for you is not really helping our issue. Thanks though. BR, Timh 2010/3/19 duncan hall dun...@viator.com mailto:dun...@viator.com Throw me in a forth on this one. I use nginx 0.8.34 for gzip compression, RAM caching of static content and SSL offload. All very simple to configure and low overheads. All requests HTTP and HTTPs go to Nginx and are then forwarded to HAproxy 1.4.2 as HTTP. Regards, Duncan Harvey Yau wrote: I can third this - nginx + haproxy works extremely well. I wish haproxy supported SSL directly. I realize it's not within the design goals of haproxy, but the need for this is out there. Good thing nginx + haproxy works well enough. -- Harvey On 3/18/10 3:29 PM, Nicholas Hadaway wrote: I can second this comment and say that it works extremely well... nginx operates very nicely as an SSL offloading device. I am right now using nginx 0.8.33 (soon to bump up to 0.8.34) and HAProxy 1.4.2 in production and things work very well for me. -Nick Maybe it's worth a try for you to get along with nginx as stunnel replacement ? Its performance is quit good and the config can be held very short, too for only accepting ssl traffic and directing it to haproxy. kind regards, Malte -- Timh Bergström System Operations Manager Diino AB - www.diino.com http://www.diino.com :wq
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi, Did you observe anything special about the CPU usage ? Was it lower than with 1.3 ? If so, it would indicate some additional delay somewhere. If it was higher, it could indicate that the Transfer-encoding parser takes too many cycles but my preliminary tests proved it to be quite efficient. I did not notice anything special about CPU usage. It seems to be around 2-4% with both versions. When checking munin-graphs, this morning I did however notice that the counter connection resets received from netstat -s was increasing a lot more with 1.4. This led me to look at the log more closely, and there seems to be a lot new errors that looks something like this: w.x.y.z:4004 [15/Mar/2010:09:50:51.190] fe_xxx be_yyy/upload-srvX 0/0/0/-1/62 502 391 - PR-- 9/6/6/3/0 0/0 PUT /dav/filename.ext HTTP/1.1 Interesting ! It looks like haproxy has aborted because the server returned an invalid response. You can check that using socat on the stats socket. For instance : echo show errors | socat stdio unix-connect:/var/run/haproxy.stat If you don't get anything, then it's something else :-/ Unfortunately I the show errors returned empty, so I guess it was something else. The good news is that I gave haproxy 1.4.2 a try today and the 502/PR error with PUT/TE:chunked requests have now vanished. So thanks for solving this. I'm not sure which one of the bugs I was hitting but it does not really matter since it now seems to be fixed. So now when I got a working haproxy 1.4, I continued to try out the option http-server-close but I hit a problem with our stunnel (patched with stunnel-4.22-xforwarded-for.diff) instances. It does not support keep-alive, so only the first HTTP request in a keepalive-session gets the X-Forwarded-For header added (insert Homer doh! here :). When giving it some thought, I guess this is the expected behaviour for what stunnel actually is supposed to do. So, for now I'll stick with option httpclose for a while longer... Keep up the good work! Best regards Erik -- Erik Gulliksson, erik.gulliks...@diino.net System Administrator, Diino AB http://www.diino.com
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hello, Unfortunately I the show errors returned empty, so I guess it was something else. The good news is that I gave haproxy 1.4.2 a try today and the 502/PR error with PUT/TE:chunked requests have now vanished. So thanks for solving this. I'm not sure which one of the bugs I was hitting but it does not really matter since it now seems to be fixed. So now when I got a working haproxy 1.4, I continued to try out the option http-server-close but I hit a problem with our stunnel (patched with stunnel-4.22-xforwarded-for.diff) instances. It does not support keep-alive, so only the first HTTP request in a keepalive-session gets the X-Forwarded-For header added (insert Homer doh! here :). When giving it some thought, I guess this is the expected behaviour for what stunnel actually is supposed to do. So, for now I'll stick with option httpclose for a while longer... Keep up the good work! Best regards Erik Maybe it's worth a try for you to get along with nginx as stunnel replacement ? Its performance is quit good and the config can be held very short, too for only accepting ssl traffic and directing it to haproxy. kind regards, Malte
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi Erik, On Thu, Mar 18, 2010 at 02:29:46PM +0100, Erik Gulliksson wrote: Unfortunately I the show errors returned empty, so I guess it was something else. The good news is that I gave haproxy 1.4.2 a try today and the 502/PR error with PUT/TE:chunked requests have now vanished. So thanks for solving this. I'm not sure which one of the bugs I was hitting but it does not really matter since it now seems to be fixed. OK so very likely it's the same problem I fixed yesterday using Bernhard's captures. So now when I got a working haproxy 1.4, I continued to try out the option http-server-close but I hit a problem with our stunnel (patched with stunnel-4.22-xforwarded-for.diff) instances. It does not support keep-alive, so only the first HTTP request in a keepalive-session gets the X-Forwarded-For header added (insert Homer doh! here :). When giving it some thought, I guess this is the expected behaviour for what stunnel actually is supposed to do. yes indeed it's expected. Stunnel is not designed to manipulate application data, and the patch only adds the header to the first request of a connection. Maybe we should implement some XCLIENT-like protocol between stunnel and haproxy, to report address of the client of the TCP connection. So, for now I'll stick with option httpclose for a while longer... You may get better results with option forceclose now, as it will release the server connection earlier. Regards, Willy
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi Malte, So now when I got a working haproxy 1.4, I continued to try out the option http-server-close but I hit a problem with our stunnel (patched with stunnel-4.22-xforwarded-for.diff) instances. It does not support keep-alive, so only the first HTTP request in a keepalive-session gets the X-Forwarded-For header added (insert Homer doh! here :). When giving it some thought, I guess this is the expected behaviour for what stunnel actually is supposed to do. So, for now I'll stick with option httpclose for a while longer... Maybe it's worth a try for you to get along with nginx as stunnel replacement ? Its performance is quit good and the config can be held very short, too for only accepting ssl traffic and directing it to haproxy. Thanks for the suggestion. I did give nginx a try in a lab setup, but for our application it did not work out with the Transfer-Encoding: chunked header, as nginx returns 411 Content-Length required for such requests. I also tried with Pound, but got a similar error. There may be other products out there I have not yet tried however. What I am looking for in my SSL-decoding solution is support for TE:chunked, http keep-alive, option to set SSL engine (for h/w acceleration), soft-reconfiguration (something like haproxy's -sf), HTTP header manipulation, open-source, free, robust and efficient. This is beginning to sound like haproxy with SSL support :) Best regards Erik Gulliksson -- Erik Gulliksson, erik.gulliks...@diino.net System Administrator, Diino AB http://www.diino.com
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
So now when I got a working haproxy 1.4, I continued to try out the option http-server-close but I hit a problem with our stunnel (patched with stunnel-4.22-xforwarded-for.diff) instances. It does not support keep-alive, so only the first HTTP request in a keepalive-session gets the X-Forwarded-For header added (insert Homer doh! here :). When giving it some thought, I guess this is the expected behaviour for what stunnel actually is supposed to do. So, for now I'll stick with option httpclose for a while longer... Maybe try to use some light web server like Nginx or Lighttpd as SSL proxy instead ? -- Mariusz Gronczewski (XANi) xani...@gmail.com GnuPG: 0xEA8ACE64 http://devrandom.pl signature.asc Description: To jest część wiadomości podpisana cyfrowo
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi Willy On Thu, Mar 18, 2010 at 3:08 PM, Willy Tarreau w...@1wt.eu wrote: OK so very likely it's the same problem I fixed yesterday using Bernhard's captures. Great! Thanks to Bernhard as well then, for providing you with the captures. yes indeed it's expected. Stunnel is not designed to manipulate application data, and the patch only adds the header to the first request of a connection. Maybe we should implement some XCLIENT-like protocol between stunnel and haproxy, to report address of the client of the TCP connection. I would love to see a feature that makes this work. I know too little about stunnel and haproxy internals to have an opinion on what would be the best/simplest way to implement it. So, for now I'll stick with option httpclose for a while longer... You may get better results with option forceclose now, as it will release the server connection earlier. OK, I will try to enable option forceclose as well. Again, thanks for all the help. Best regards Erik -- Erik Gulliksson, erik.gulliks...@diino.net System Administrator, Diino AB http://www.diino.com
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi Erik, On Mon, Mar 15, 2010 at 10:27:38AM +0100, Erik Gulliksson wrote: Hi Willy, Thanks for your elaborative answer. Did you observe anything special about the CPU usage ? Was it lower than with 1.3 ? If so, it would indicate some additional delay somewhere. If it was higher, it could indicate that the Transfer-encoding parser takes too many cycles but my preliminary tests proved it to be quite efficient. I did not notice anything special about CPU usage. It seems to be around 2-4% with both versions. When checking munin-graphs, this morning I did however notice that the counter connection resets received from netstat -s was increasing a lot more with 1.4. This led me to look at the log more closely, and there seems to be a lot new errors that looks something like this: w.x.y.z:4004 [15/Mar/2010:09:50:51.190] fe_xxx be_yyy/upload-srvX 0/0/0/-1/62 502 391 - PR-- 9/6/6/3/0 0/0 PUT /dav/filename.ext HTTP/1.1 Interesting ! It looks like haproxy has aborted because the server returned an invalid response. You can check that using socat on the stats socket. For instance : echo show errors | socat stdio unix-connect:/var/run/haproxy.stat If you don't get anything, then it's something else :-/ This is only for a few of the PUT requests, most requests seem to get proxied successfully. I will try to reproduce this in a more controlled lab setup where I can sniff HTTP-headers to see what is actually sent in the request. That would obviously help too :-) No, I've run POST requests (very similar to PUT), except that there was no Transfer-Encoding in the requests. It's interesting that you're doing that in the request, because Apache removed support for TE:chunked a few years ago because there was no user. Also, most of my POST tests were not performance related. Interesting. We do use Apache for parts of this application on the backend side, although PUT requests are handled by an in-house developed Erlang application. OK. A big part has changed, in previous version, haproxy did not care at all about the payload. It only saw headers. Now with keepalive support, it has to find requests/responses bounds and as such must parse the transfer-encoding and content-lengths. However, transfer encoding is nice to components such as haproxy because it's very cheap. Haproxy reads a chunk size (one line), then forwards that many bytes, then reads a new chunk size, etc... So this is really a cheap operation. My tests have shown no issue at gigabit/s speeds with just a few bytes per chunk. I suspect that the application tries to use the chunked encoding to simulate a bidirectionnal access. In this case, it might be waiting for data pending in the kernel buffers which were sent by haproxy with the MSG_MORE flag, indicating that more data are following (and so you should observe a low CPU usage). Could you please do a small test : in src/stream_sock.c, please comment out line 616 : 615 /* this flag has precedence over the rest */ 616 // if (b-flags BF_SEND_DONTWAIT) 617 send_flag = ~MSG_MORE; It will unconditionally disable use of MSG_MORE. If this fixes the issue for you, I'll probably have to add an option to disable this packet merging for very specific applications. I tried to comment out the line above as instructed, but it made no noticable change. As stated above, I will try to reproduce the problem in a lab setup. This may be an issue with our application rather than haproxy. OK, thanks for testing ! Best regards, Willy
Re: Throughput degradation after upgrading haproxy from 1.3.22 to 1.4.1
Hi Erik, On Fri, Mar 12, 2010 at 11:08:08AM +0100, Erik Gulliksson wrote: Hi! First, I'd like to thank Willy and the other haproxy contributors for bringing this wonderful piece of software into the world :) Thanks ! For the last 2 years now we have been running haproxy 1.3 successfully to load balance our frontend applications and storage services. Mainly the requests passing through our haproxy instances are WebDAV commands. Since there were some new sought-after features announced in the new stable 1.4 branch, yesterday we gave it a go and upgraded to haproxy from 1.3.22 to 1.4.1 in our production environment (simply replaced active binary with -sf switch). After the new version was deployed our incoming traffic slowly dropped from approximately 150Mbps to 80Mbps (as ongoing requests were still processed by 1.3.22). The configuration file were not changed between the two versions, so we have not yet started use any of the new config options for 1.4 (http-server-close etc). Because of the drop in throughput we have now rolled back to 1.3.22 (and traffic levels are back to normal). Did you observe anything special about the CPU usage ? Was it lower than with 1.3 ? If so, it would indicate some additional delay somewhere. If it was higher, it could indicate that the Transfer-encoding parser takes too many cycles but my preliminary tests proved it to be quite efficient. What differ our service from most other online services is that we are more of a content-consumer rather than a content provider. The requests that are generating our traffic volume is mostly large and small PUT requests with Transfer-Encoding: chunked. Is this type of requests included in any of your tests or benchmarks? No, I've run POST requests (very similar to PUT), except that there was no Transfer-Encoding in the requests. It's interesting that you're doing that in the request, because Apache removed support for TE:chunked a few years ago because there was no user. Also, most of my POST tests were not performance related. Do you have a clue of what might have changed in the code base to cause this behavior? Any suggestions for where to go from here (other than sticking with 1.3 :) is greatly appreciated. A big part has changed, in previous version, haproxy did not care at all about the payload. It only saw headers. Now with keepalive support, it has to find requests/responses bounds and as such must parse the transfer-encoding and content-lengths. However, transfer encoding is nice to components such as haproxy because it's very cheap. Haproxy reads a chunk size (one line), then forwards that many bytes, then reads a new chunk size, etc... So this is really a cheap operation. My tests have shown no issue at gigabit/s speeds with just a few bytes per chunk. I suspect that the application tries to use the chunked encoding to simulate a bidirectionnal access. In this case, it might be waiting for data pending in the kernel buffers which were sent by haproxy with the MSG_MORE flag, indicating that more data are following (and so you should observe a low CPU usage). Could you please do a small test : in src/stream_sock.c, please comment out line 616 : 615 /* this flag has precedence over the rest */ 616 // if (b-flags BF_SEND_DONTWAIT) 617 send_flag = ~MSG_MORE; It will unconditionally disable use of MSG_MORE. If this fixes the issue for you, I'll probably have to add an option to disable this packet merging for very specific applications. Regards, Willy