RE: VM benchmarks
I'm wondering what the difference would be between the standard slow e1000 virtual network card and the fast paravirtualized vmxnet3 virtual network card. In theory, the latter one should be much, much faster.. -- With kind regards, Angelo Höngens Systems Administrator -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg T: +31 (0)13 5811088 F: +31 (0)13 5821239 mailto:a.hong...@netmatch.nl http://www.netmatch.nl -- -Original Message- From: Les Stroud [mailto:l...@lesstroud.com] Sent: woensdag 27 oktober 2010 21:55 To: Ariel Cc: haproxy Subject: Re: VM benchmarks Check out this thread I had earlier in the month on the same topic: http://www.formilux.org/archives/haproxy/1010/3910.html Bottom line: vmware will slow down your upper level transaction limit by a significant amount (like an order of maginitude). The software drivers underneath the network stack and the system stack add enough overhead to reduce your maximum transaction ceiling to around 6000 trans/sec on haproxy (this is without a backend constraint). On a hardware device, I am seeing much higher numbers (50k). LES On Oct 26, 2010, at 10:38 AM, Ariel wrote: Does anyone know of studies done comparing haproxy on dedicated hardware vs virtual machine? Or perhaps some virtual machine specific considerations? -a
Re: VM benchmarks
On Thu, Oct 28, 2010 at 07:10:32AM +, Angelo Höngens wrote: I'm wondering what the difference would be between the standard slow e1000 virtual network card and the fast paravirtualized vmxnet3 virtual network card. In theory, the latter one should be much, much faster.. We've tested that at Exceliance. Yes it's a lot faster. But still a lot slower than the native machine. To give you an idea, you can get about 6000 connections per second under ESX on a machine that natively supports between 25000 and 4 depending on the NICs. Regards, Willy
stats page errors column
List, I didn't immediately see this in the docs. What types of errors (CD, sQ, etc) are included in the error column labeled as conn and resp on the haproxy stats page? Thanks. -Joe Name: Joseph A. Williams Email: j...@joetify.com Blog: http://www.joeandmotorboat.com/ Twitter: http://twitter.com/williamsjoe
Re: Slow TCP open on haproxy
On Tue, 2010-10-26 at 19:24 +0200, Willy Tarreau wrote: Hi Maxime, Hi Willy On Tue, Oct 26, 2010 at 12:47:37PM -0400, Maxime Ducharme wrote: Hi guys I am new to haproxy list, my experience with this software is very good yet. Got a question about sockets tuning. We have web site running on 10 different httpd with 2 haproxy in front. We configured 3 IPs on each haproxy, we get about 2200 req/s each, peak time is 3500 req/s each. Current load is very low actually on haproxy boxes, but we have noticed some slow access to the website. Doing analysis we found out that sometime opening a TCP socket on haproxy box is slower than opening a socket directly on one of httpd behind. The actual configuration is quite simple, here is snippet : global maxconn 32768 nbproc 8 defaults log global retries 3 maxconn 32768 contimeout 5000 clitimeout 5 srvtimeout 5 listen weblb1 1.1.1.1:80 bind 1.1.1.2:80 bind 1.1.1.3:80 mode http balance roundrobin option forwardfor option httpchk HEAD / HTTP/1.0 option httpclose stats enable server web1 1.1.2.1:80 weight 10 check port 80 .. server web10 1.1.2.10:80 weight 10 check port 80 We put nbprocs to the same amount of CPU cores we have. We noticed problem by tracing HTTP request with curl, ex: 15:14:13.684549 * About to connect() to www.website.com port 80 (#0) 15:14:13.685620 * Trying 1.1.1.1... connected -- 3 seconds here to open TCP connection 15:14:16.796281 * Connected to www.website.com (1.1.1.1) port 80 (#0) 15:14:16.797173 GET / HTTP/1.1 -- httpd replies here in less than 1 second A 3 second delay is a typical SYN retransmit. make sense This issue happens sometime, not always. My question, can someone point me a direction to look for for sockets optimization / debugging. I am currently unable to explain why it is slow, I know this is not hardware related since it is very powerful box. I believe some tuning will make a big difference. Maybe we have kernel tuning to do in here, if someone can enlighten me it would be very appreciated. Two things to look for : - if you have ip_conntrack / nf_conntrack loaded, either you have to unload it, or to properly tune it for your usage (I'd recommend the former, it's easier). good point, not loaded - check sys.net.core.somaxconn. If it's 128, then your TCP stack is not tuned for a high connection rate, and you're surely dropping incoming connections from time to time. Try to first increase that single parameter to 1, restart haproxy and check if it changes anything. was set to 128. Raised to value to 1 and we see better results now. A new problem appeared tough which is : Oct 28 19:07:40 v-2-fg09-d861-15 kernel: [735810.205858] TCP: drop open request from 1.1.1.1/42274 Oct 28 19:07:45 v-2-fg09-d861-15 kernel: [735815.237132] TCP: drop open request from 1.1.1.2/2847 Oct 28 19:07:50 v-2-fg09-d861-15 kernel: [735820.276368] TCP: drop open request from 1.1.1.3/3925 Oct 28 19:07:55 v-2-fg09-d861-15 kernel: [735825.308858] TCP: drop open request from 1.1.1.4/49952 ... I also see unreplied SYNs in netstat : # netstat -an |grep SYN_RECV |grep -cv grep 1426 Now I am taking a look at tcp_max_syn_backlog value, I am thinking of raising this value also but I would like to have your opinion. We see this issue when req/s get to 2300/s, only in peak time of day. Rest of day is ok and response time is excellent. Note that you don't need 8 processes with that load, it will be harder to debug, health checks will not be synced, and stats will only be per-process. Good, we now run 1 instance only. Another question : can I enable stats on a particular IP ? yes, simply put the stats enable statement in its own listen section. thanks Last, with version 1.4, you can also reduce the connection rate by using option http-server-close instead of option httpclose. It will enable keep-alive on the client side. Do that only when you have fixed your issues, because doing so can mask the problem without fixing it, and you'll get it again later. thanks for this also, I will look into this one after. Regards, Willy have a nice day Maxime
Re: VM benchmarks
Le jeudi 28 octobre 2010 15:58:55, Ariel a écrit : Hi Cyril, My test wasn't designed to look at higher load averages (many users at once) since the problem I was looking at was just increased latency for all requests. You mean that with only 1 request at a time through haproxy you obtain a response in 150ms where a direct request gives a response in 10 to 30ms ? I agree, this looks really strange. I reproduced nearly the same environment as you described and could not reproduce this latency (only 1 nginx instance in my case). To be clear on the config I used (I didn't took time to have a clean and tuned installation): - 1 server running VirtualBox 3.2.8 OS : Mandriva Cooker (not recently updated) Kernel : Linux localhost 2.6.35.6-server-1mnb #1 SMP ... x86_64 GNU/Linux CPU : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz Memory : 4Gb IP : 192.168.0.128 With 2 small VMs based on a Debian Lenny 5.0.6 : Kernel : 2.6.26-2-amd64 #1 SMP ... x86_64 GNU/Linux - Instance 1 : 1 CPU allocated Memory : 512Mb IP : 192.168.0.23 HAProxy 1.4.8 installed with your configuration (only one backend server pointing to the second VM instance) - Instance 2 : 1 CPU allocated Memory : 384Mb IP : 192.168.0.24 nginx 0.7.65 embedding your ajax test - 1 laptop used as the client OS : Ubuntu 10.10 Kernel : 2.6.35-22-generic #35-Ubuntu SMP ... i686 GNU/Linux Memory : 2Gb TEST 1 : Firefox/Firebug - direct access to nginx via 192.168.0.24 : firebug shows response times about 2ms - access to haproxy via 192.168.0.23 : response times are about 3ms TEST 2 : Chromium/Firebug lite - direct access to nginx via 192.168.0.24 : response times between 10 and 15ms - access to haproxy via 192.168.0.23 : response times still between 10 and 15ms TEST 3 : using ab for 1 requests with a concurrency of 1 (no keepalive) - via nginx : ab -n1 -c1 http://192.168.0.24/ajax.txt Percentage of the requests served within a certain time (ms) 50% 2 66% 2 75% 2 80% 2 90% 2 95% 2 98% 2 99% 3 100% 16 (longest request) - via haproxy : ab -n1 -c1 http://192.168.0.23/ajax.txt Percentage of the requests served within a certain time (ms) 50% 3 66% 3 75% 4 80% 4 90% 4 95% 4 98% 5 99% 5 100% 14 (longest request) The results are similar. TEST 4 : using ab for 1 requests with a concurrency of 10 (no keepalive) - via nginx : ab -n1 -c10 http://192.168.0.24/ajax.txt Percentage of the requests served within a certain time (ms) 50% 6 66% 6 75% 6 80% 6 90% 7 95% 8 98% 8 99% 9 100% 25 (longest request) - via haproxy : ab -n1 -c10 http://192.168.0.23/ajax.txt Percentage of the requests served within a certain time (ms) 50% 18 66% 21 75% 23 80% 24 90% 30 95% 35 98% 40 99% 43 100% 56 (longest request) Ok, it starts to be less responsive but this is because the VirtualBox server now uses nearly 100% of its 2 CPU cores. But this is still far from what you observe. TEST 5 : using ab for 1 requests with a concurrency of 100 (no keepalive) Just to be quite agressive with the VMs. - via nginx : ab -n1 -c100 http://192.168.0.24/ajax.txt Percentage of the requests served within a certain time (ms) 50% 54 66% 55 75% 57 80% 65 90% 76 95% 78 98% 79 99% 81 100%268 (longest request) - via haproxy : ab -n1 -c100 http://192.168.0.23/ajax.txt Percentage of the requests served within a certain time (ms) Percentage of the requests served within a certain time (ms) 50%171 66%184 75%192 80%198 90%217 95%241 98%287 99%314 100% 3153 (longest request) I can't help you much more but I hope this results will give you some points of comparison. What is the hardware of your Virtualbox server ? -- Cyril Bonté
Question regarding cookie
Hi all, Let's say I have 2 sites that are served with the same haproxy instance. If I go direct to site1 all is fine I'm using one of the server of site1 backend If I go direct to site2 all is fine I'm using one of the server of site2 backend But from the web site1 if I clic a link to go to site2 in won't work instance # 1 and #2 share physical server but with diffrent cookie because they use different backend some are apache other are tomcat. What I'm thinking is that if I open a broswer to go directly to site1 or 2 all is fine since I have no cookie But If I click to site2 from within site1 well, I probably already have cookie for site1 in the request and I end with a FILE not found. I went to the doc and I'm pretty sure cookie rewrite or similar will help me but I would like to have your input on this kind of setup. cookie SERVERID indirect or cookie SERVERID rewrite Which should I used ?? Another one I use this command to dump http data with tcpdump but I'm sure there is a simpler one tcpdump -s 0 -A -i any 'tcp port 80 and (((ip[2:2] - ((ip[0]0xf)2)) - ((tcp[12]0xf0)2)) != 0)' Thanks for your input ! -- Guillaume Bourque, B.Sc., consultant, infrastructures technologiques libres ! Logisoft Technologies inc. http://www.logisoftech.com 514 576-7638, http://ca.linkedin.com/in/GuillaumeBourque/fr
Re: stats page errors column
Hi Joe, On Thu, Oct 28, 2010 at 09:24:42AM -0700, Joe Williams wrote: List, I didn't immediately see this in the docs. What types of errors (CD, sQ, etc) are included in the error column labeled as conn and resp on the haproxy stats page? For the conn column, those are the failed connection attempts (timeouts or rejects). Normally they'll be sC and SC. For the resp column, all the ones that are caused by the server after the connection was established. Typically sH, SH, and PH when the server returns crap. Up to and including 1.4.8, there was a bug resulting in tcp-request rules incrementing the resp column instead of the req column when blocking. This was fixed in 1.4.9. Hoping this helps, Willy
Re: Question regarding cookie
Hi Guillaume, On Thu, Oct 28, 2010 at 05:56:20PM -0400, Guillaume Bourque wrote: Hi all, Let's say I have 2 sites that are served with the same haproxy instance. If I go direct to site1 all is fine I'm using one of the server of site1 backend If I go direct to site2 all is fine I'm using one of the server of site2 backend But from the web site1 if I clic a link to go to site2 in won't work instance # 1 and #2 share physical server but with diffrent cookie because they use different backend some are apache other are tomcat. What I'm thinking is that if I open a broswer to go directly to site1 or 2 all is fine since I have no cookie But If I click to site2 from within site1 well, I probably already have cookie for site1 in the request and I end with a FILE not found. No because if your sites have different names, the browser takes extreme care not to send the cookie to the wrong one. This is a big security concern before anything else. If your sites are in fact sub-directories of the same host name, you'd probably prefer to use different cookie names then, so that the browser can learn them separately. Or maybe there is something special in your setup that I did not get ? Regards, Willy
Re: Troubleshooting response times
Hi Guy, On Wed, Oct 27, 2010 at 12:49:16PM -0700, g...@desgames.com wrote: Hi all, We're trying to narrow down the source of delays we're seeing in response times from our web cluster. Using firebug, we're seeing that scripts are taking around 10 - 50 ms to complete (we're returning that in the response data), but the total response time shown by firebug is anywhere between 100ms all the way up to, in some cases, a couple of seconds. This also seems to have increased in the recent past. if you observe randomly spread response times with a background noise looking like stairs at multiple seconds (generally 3 secs), most of the time this is caused by TCP retransmits due to losses anywhere between a client and a server. If your logs report long connect times between haproxy and your servers, then you can spot an issue in your infra. If you are lucky to see long request times (those are rare), sometimes it indicates that a client is having difficulties sending a request after the connection is accepted. If you want to check how your server's response times are seen from haproxy, then halog (in the contrib subdir) can help you. Use it with -pct to get a percentile of connect and response times. And the newly released 1.4.9 adds features to report response times by URL in halog. Most of the time, the log files are the starting point, so that you can find where to search and where not to search. Regards, Willy
Re: [ANNOUNCE] haproxy 1.4.9
Thanks, Willy! I am especially excited about the new per-URL statistics, super especially for the average time metric. However, I can't use these flags with my build of 1.4.9 from source. # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uaoHA-Proxy version 1.4.9 2010/10/28 Copyright 2000-2010 Willy Tarreau w...@1wt.eu Usage : /usr/local/sbin/haproxy [-f cfgfile]* [ -vdVD ] [ -n maxconn ] [ -N maxpconn ] [ -p pidfile ] [ -m max megs ] -v displays version ; -vv shows known build options. -d enters debug mode ; -db only disables background mode. -V enters verbose mode (disables quiet mode) -D goes daemon -q quiet mode : don't display messages -c check mode : only check config files and exit -n sets the maximum total # of connections (2000) -m limits the usable amount of memory (in MB) -N sets the default, per-proxy maximum # of connections (2000) -p writes pids of all children to this file -de disables epoll() usage even when available -ds disables speculative epoll() usage even when available -dp disables poll() usage even when available -sf/-st [pid ]* finishes/terminates old pids. Must be last arguments. # ### same with only -u and only -uc as sanity tests. # # haproxy -vvv HA-Proxy version 1.4.9 2010/10/28 Copyright 2000-2010 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -m32 -march=i386 -O2 -g OPTIONS = USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Available polling systems : sepoll : pref=400, test result OK epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 4 (4 usable), will use sepoll. # ### Thanks! Sorry if I'm just missing it! :) On Thu, Oct 28, 2010 at 3:40 PM, Willy Tarreau w...@1wt.eu wrote: The new feature of halog is a per-URL statistics (req error counts, avg response time, total response time, and that for all or valid only requests). The output is sorted by a field specified from the command line flag, among which URL (-u), req count (-uc), err count (-ue), total time (-ut), average time (-ua), total time on OK reqs (-uto) and avg time on OK reqs (-uao).
Re: [ANNOUNCE] haproxy 1.4.9
... sorry about those broken new lines. Here's the gist: http://gist.github.com/652558 On Thu, Oct 28, 2010 at 4:27 PM, Carlo Flores ca...@petalphile.com wrote: Thanks, Willy! I am especially excited about the new per-URL statistics, super especially for the average time metric. However, I can't use these flags with my build of 1.4.9 from source. # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uaoHA-Proxy version 1.4.9 2010/10/28 Copyright 2000-2010 Willy Tarreau w...@1wt.eu Usage : /usr/local/sbin/haproxy [-f cfgfile]* [ -vdVD ] [ -n maxconn ] [ -N maxpconn ] [ -p pidfile ] [ -m max megs ] -v displays version ; -vv shows known build options. -d enters debug mode ; -db only disables background mode. -V enters verbose mode (disables quiet mode) -D goes daemon -q quiet mode : don't display messages -c check mode : only check config files and exit -n sets the maximum total # of connections (2000) -m limits the usable amount of memory (in MB) -N sets the default, per-proxy maximum # of connections (2000) -p writes pids of all children to this file -de disables epoll() usage even when available -ds disables speculative epoll() usage even when available -dp disables poll() usage even when available -sf/-st [pid ]* finishes/terminates old pids. Must be last arguments. # ### same with only -u and only -uc as sanity tests. # # haproxy -vvv HA-Proxy version 1.4.9 2010/10/28 Copyright 2000-2010 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -m32 -march=i386 -O2 -g OPTIONS = USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Available polling systems : sepoll : pref=400, test result OK epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 4 (4 usable), will use sepoll. # ### Thanks! Sorry if I'm just missing it! :) On Thu, Oct 28, 2010 at 3:40 PM, Willy Tarreau w...@1wt.eu wrote: The new feature of halog is a per-URL statistics (req error counts, avg response time, total response time, and that for all or valid only requests). The output is sorted by a field specified from the command line flag, among which URL (-u), req count (-uc), err count (-ue), total time (-ut), average time (-ua), total time on OK reqs (-uto) and avg time on OK reqs (-uao).
Re: [ANNOUNCE] haproxy 1.4.9
Le vendredi 29 octobre 2010 01:27:30, Carlo Flores a écrit : I am especially excited about the new per-URL statistics, super especially for the average time metric. However, I can't use these flags with my build of 1.4.9 from source. # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...) Those options are not for haproxy itself but for halog (see the directory contrib/halog in the sources archive) ;-) -- Cyril Bonté
Re: [ANNOUNCE] haproxy 1.4.9
D'oh! Thank you, Cyril! On Thu, Oct 28, 2010 at 4:45 PM, Cyril Bonté cyril.bo...@free.fr wrote: Le vendredi 29 octobre 2010 01:27:30, Carlo Flores a écrit : I am especially excited about the new per-URL statistics, super especially for the average time metric. However, I can't use these flags with my build of 1.4.9 from source. # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...) Those options are not for haproxy itself but for halog (see the directory contrib/halog in the sources archive) ;-) -- Cyril Bonté
Re: [ANNOUNCE] haproxy 1.4.9
On Thu, Oct 28, 2010 at 04:50:13PM -0700, Carlo Flores wrote: D'oh! Thank you, Cyril! (...) # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...) and you should only use one of these -u* at a time, since they all do the same thing and just change the sorting order ! Willy