RE: VM benchmarks

2010-10-28 Thread Angelo Höngens
I'm wondering what the difference would be between the standard slow e1000 
virtual network card and the fast paravirtualized vmxnet3 virtual network card. 
In theory, the latter one should be much, much faster.. 

-- 

 
With kind regards,
 
 
Angelo Höngens
 
Systems Administrator
 
--
NetMatch
tourism internet software solutions
 
Ringbaan Oost 2b
5013 CA Tilburg
T: +31 (0)13 5811088
F: +31 (0)13 5821239
 
mailto:a.hong...@netmatch.nl
http://www.netmatch.nl
--


 -Original Message-
 From: Les Stroud [mailto:l...@lesstroud.com]
 Sent: woensdag 27 oktober 2010 21:55
 To: Ariel
 Cc: haproxy
 Subject: Re: VM benchmarks
 
 Check out this thread I had earlier in the month on the same topic:
 http://www.formilux.org/archives/haproxy/1010/3910.html
 
 Bottom line: vmware will slow down your upper level transaction limit
 by a significant amount (like an order of maginitude).  The software
 drivers underneath the network stack and the system stack add enough
 overhead to reduce your maximum transaction ceiling to around 6000
 trans/sec on haproxy (this is without a backend constraint).  On a
 hardware device, I am seeing much higher numbers (50k).
 
 LES
 
 
 On Oct 26, 2010, at 10:38 AM, Ariel wrote:
 
  Does anyone know of studies done comparing haproxy on dedicated
 hardware vs virtual machine?  Or perhaps some virtual machine specific
 considerations?
  -a
 




Re: VM benchmarks

2010-10-28 Thread Willy Tarreau
On Thu, Oct 28, 2010 at 07:10:32AM +, Angelo Höngens wrote:
 I'm wondering what the difference would be between the standard slow e1000 
 virtual network card and the fast paravirtualized vmxnet3 virtual network 
 card. In theory, the latter one should be much, much faster.. 

We've tested that at Exceliance. Yes it's a lot faster. But still a lot
slower than the native machine. To give you an idea, you can get about
6000 connections per second under ESX on a machine that natively supports
between 25000 and 4 depending on the NICs.

Regards,
Willy




stats page errors column

2010-10-28 Thread Joe Williams

List,

I didn't immediately see this in the docs. What types of errors (CD, sQ, etc) 
are included in the error column labeled as conn and resp on the haproxy 
stats page?

Thanks.

-Joe


Name: Joseph A. Williams
Email: j...@joetify.com
Blog: http://www.joeandmotorboat.com/
Twitter: http://twitter.com/williamsjoe




Re: Slow TCP open on haproxy

2010-10-28 Thread Maxime Ducharme

On Tue, 2010-10-26 at 19:24 +0200, Willy Tarreau wrote:
 Hi Maxime,

Hi Willy

 
 On Tue, Oct 26, 2010 at 12:47:37PM -0400, Maxime Ducharme wrote:
  
  Hi guys
  
  I am new to haproxy  list, my experience with this software is very
  good yet.
  
  Got a question about sockets tuning. We have web site running on 10
  different httpd with 2 haproxy in front.
  
  We configured 3 IPs on each haproxy, we get about 2200 req/s each, peak
  time is 3500 req/s each.
  
  Current load is very low actually on haproxy boxes, but we have noticed
  some slow access to the website. Doing analysis we found out that
  sometime opening a TCP socket on haproxy box is slower than opening a
  socket directly on one of httpd behind.
  
  The actual configuration is quite simple, here is snippet :
  
  global
  maxconn 32768
  nbproc 8
  
  defaults
  log global
  retries 3
  maxconn 32768
  contimeout 5000
  clitimeout 5
  srvtimeout 5
  
  listen weblb1 1.1.1.1:80
  bind 1.1.1.2:80
  bind 1.1.1.3:80
  
  mode http
  balance roundrobin  
  
  option forwardfor
  option httpchk HEAD / HTTP/1.0
  option httpclose
  stats enable
  server web1 1.1.2.1:80 weight 10 check port 80
  ..
  server web10 1.1.2.10:80 weight 10 check port 80
  
  
  We put nbprocs to the same amount of CPU cores we have.
  
  We noticed problem by tracing HTTP request with curl, ex:
  
  15:14:13.684549 * About to connect() to www.website.com port 80 (#0)
  15:14:13.685620 *   Trying 1.1.1.1... connected
  -- 3 seconds here to open TCP connection
  15:14:16.796281 * Connected to www.website.com (1.1.1.1) port 80 (#0)
  15:14:16.797173  GET / HTTP/1.1
  -- httpd replies here in less than 1 second
 
 A 3 second delay is a typical SYN retransmit.

make sense

 
  This issue happens sometime, not always.
  
  My question, can someone point me a direction to look for for sockets
  optimization / debugging. I am currently unable to explain why it is
  slow, I know this is not hardware related since it is very powerful box.
  I believe some tuning will make a big difference. Maybe we have kernel
  tuning to do in here, if someone can enlighten me it would be very
  appreciated.
 
 Two things to look for :
   - if you have ip_conntrack / nf_conntrack loaded, either you have to
 unload it, or to properly tune it for your usage (I'd recommend the
 former, it's easier).

good point, not loaded

 
   - check sys.net.core.somaxconn. If it's 128, then your TCP stack is not
 tuned for a high connection rate, and you're surely dropping incoming
 connections from time to time. Try to first increase that single
 parameter to 1, restart haproxy and check if it changes anything.
 

was set to 128. Raised to value to 1 and we see better results now.
A new problem appeared tough which is :
Oct 28 19:07:40 v-2-fg09-d861-15 kernel: [735810.205858] TCP: drop open
request from 1.1.1.1/42274
Oct 28 19:07:45 v-2-fg09-d861-15 kernel: [735815.237132] TCP: drop open
request from 1.1.1.2/2847
Oct 28 19:07:50 v-2-fg09-d861-15 kernel: [735820.276368] TCP: drop open
request from 1.1.1.3/3925
Oct 28 19:07:55 v-2-fg09-d861-15 kernel: [735825.308858] TCP: drop open
request from 1.1.1.4/49952
...

I also see unreplied SYNs in netstat :
# netstat -an |grep SYN_RECV |grep -cv grep
1426

Now I am taking a look at tcp_max_syn_backlog value, I am thinking of
raising this value also but I would like to have your opinion. We see
this issue when req/s get to 2300/s, only in peak time of day. Rest of
day is ok and response time is excellent.


 Note that you don't need 8 processes with that load, it will be harder to
 debug, health checks will not be synced, and stats will only be per-process.

Good, we now run 1 instance only.

 
  Another question :
  
  can I enable stats on a particular IP ?
 
 yes, simply put the stats enable statement in its own listen section.

thanks

 
 Last, with version 1.4, you can also reduce the connection rate by using
 option http-server-close instead of option httpclose. It will enable
 keep-alive on the client side. Do that only when you have fixed your
 issues, because doing so can mask the problem without fixing it, and you'll
 get it again later.

thanks for this also, I will look into this one after.


 
 Regards,
 Willy
 
 

have a nice day

Maxime




Re: VM benchmarks

2010-10-28 Thread Cyril Bonté
Le jeudi 28 octobre 2010 15:58:55, Ariel a écrit :
 Hi Cyril,
 My test wasn't designed to look at higher load averages (many users at
 once) since the problem I was looking at was just increased latency for
 all requests.

You mean that with only 1 request at a time through haproxy you obtain a 
response in 150ms where a direct request gives a response in 10 to 30ms ?
I agree, this looks really strange.

I reproduced nearly the same environment as you described and could not 
reproduce this latency (only 1 nginx instance in my case).
To be clear on the config I used (I didn't took time to have a clean and tuned 
installation):
- 1 server running VirtualBox 3.2.8
  OS : Mandriva Cooker (not recently updated)
  Kernel : Linux localhost 2.6.35.6-server-1mnb #1 SMP ... x86_64 GNU/Linux
  CPU : Intel(R) Core(TM)2 Duo CPU E6750  @ 2.66GHz
  Memory : 4Gb
  IP : 192.168.0.128

  With 2 small VMs based on a Debian Lenny 5.0.6 :
Kernel : 2.6.26-2-amd64 #1 SMP ... x86_64 GNU/Linux

- Instance 1 :
  1 CPU allocated
  Memory : 512Mb
  IP : 192.168.0.23
  HAProxy 1.4.8 installed with your configuration (only one backend server
  pointing to the second VM instance)

- Instance 2 :
  1 CPU allocated
  Memory : 384Mb
  IP : 192.168.0.24
 nginx 0.7.65 embedding your ajax test

- 1 laptop used as the client
  OS : Ubuntu 10.10
  Kernel : 2.6.35-22-generic #35-Ubuntu SMP ... i686 GNU/Linux
  Memory : 2Gb

TEST 1 : Firefox/Firebug
- direct access to nginx via 192.168.0.24 : firebug shows response times about 
2ms
- access to haproxy via 192.168.0.23 : response times are about 3ms

TEST 2 : Chromium/Firebug lite
- direct access to nginx via 192.168.0.24 : response times between 10 and 15ms
- access to haproxy via 192.168.0.23 : response times still between 10 and 
15ms

TEST 3 : using ab for 1 requests with a concurrency of 1 (no keepalive)
- via nginx : ab -n1 -c1 http://192.168.0.24/ajax.txt
Percentage of the requests served within a certain time (ms)
  50%  2
  66%  2
  75%  2
  80%  2
  90%  2
  95%  2
  98%  2
  99%  3
 100% 16 (longest request)

- via haproxy : ab -n1 -c1 http://192.168.0.23/ajax.txt
Percentage of the requests served within a certain time (ms)
  50%  3
  66%  3
  75%  4
  80%  4
  90%  4
  95%  4
  98%  5
  99%  5
 100% 14 (longest request)
The results are similar.

TEST 4 : using ab for 1 requests with a concurrency of 10 (no keepalive)
- via nginx : ab -n1 -c10 http://192.168.0.24/ajax.txt
Percentage of the requests served within a certain time (ms)
  50%  6
  66%  6
  75%  6
  80%  6
  90%  7
  95%  8
  98%  8
  99%  9
 100% 25 (longest request)

- via haproxy : ab -n1 -c10 http://192.168.0.23/ajax.txt
Percentage of the requests served within a certain time (ms)
  50% 18
  66% 21
  75% 23
  80% 24
  90% 30
  95% 35
  98% 40
  99% 43
 100% 56 (longest request)
Ok, it starts to be less responsive but this is because the VirtualBox server 
now uses nearly 100% of its 2 CPU cores.
But this is still far from what you observe.

TEST 5 : using ab for 1 requests with a concurrency of 100 (no keepalive)
Just to be quite agressive with the VMs.
- via nginx : ab -n1 -c100 http://192.168.0.24/ajax.txt
Percentage of the requests served within a certain time (ms)
  50% 54
  66% 55
  75% 57
  80% 65
  90% 76
  95% 78
  98% 79
  99% 81
 100%268 (longest request)

- via haproxy : ab -n1 -c100 http://192.168.0.23/ajax.txt
Percentage of the requests served within a certain time (ms)
Percentage of the requests served within a certain time (ms)
  50%171
  66%184
  75%192
  80%198
  90%217
  95%241
  98%287
  99%314
 100%   3153 (longest request)

I can't help you much more but I hope this results will give you some points 
of comparison. What is the hardware of your Virtualbox server ?

-- 
Cyril Bonté



Question regarding cookie

2010-10-28 Thread Guillaume Bourque

Hi all,

Let's say I have 2 sites that are served with the same haproxy instance.

If I go direct to site1 all is fine I'm using one of the server of  
site1 backend


If I go direct to site2 all is fine I'm using one of the server of site2 
backend


But from the web site1 if I clic a link to go to site2 in won't work

instance # 1 and #2 share physical server but with diffrent cookie 
because they use different backend some are apache other are tomcat.


What I'm thinking is that if I open a broswer to go directly to site1 or 
2 all is fine since I have no cookie


But If I click to site2 from within site1 well, I probably already have 
cookie for site1 in the request and I end with a FILE not found.


I went to the doc and I'm pretty sure cookie rewrite or similar will 
help me but I would like to have your input on this kind of setup.


cookie SERVERID indirect

or 


cookie SERVERID rewrite

Which should I used  ??

Another one

I use this command to dump http data with tcpdump but I'm sure there is a 
simpler one

tcpdump -s 0 -A -i any 'tcp port 80 and (((ip[2:2] - ((ip[0]0xf)2)) - ((tcp[12]0xf0)2)) != 0)' 


Thanks for your input !

--
Guillaume Bourque, B.Sc.,
consultant, infrastructures technologiques libres !
Logisoft Technologies inc.  http://www.logisoftech.com
514 576-7638, http://ca.linkedin.com/in/GuillaumeBourque/fr




Re: stats page errors column

2010-10-28 Thread Willy Tarreau
Hi Joe,

On Thu, Oct 28, 2010 at 09:24:42AM -0700, Joe Williams wrote:
 
 List,
 
 I didn't immediately see this in the docs. What types of errors (CD, sQ, etc) 
 are included in the error column labeled as conn and resp on the 
 haproxy stats page?

For the conn column, those are the failed connection attempts (timeouts or
rejects). Normally they'll be sC and SC.

For the resp column, all the ones that are caused by the server after
the connection was established. Typically sH, SH, and PH when the
server returns crap. Up to and including 1.4.8, there was a bug resulting
in tcp-request rules incrementing the resp column instead of the req column
when blocking. This was fixed in 1.4.9.

Hoping this helps,
Willy




Re: Question regarding cookie

2010-10-28 Thread Willy Tarreau
Hi Guillaume,

On Thu, Oct 28, 2010 at 05:56:20PM -0400, Guillaume Bourque wrote:
 Hi all,
 
 Let's say I have 2 sites that are served with the same haproxy instance.
 
 If I go direct to site1 all is fine I'm using one of the server of  
 site1 backend
 
 If I go direct to site2 all is fine I'm using one of the server of site2 
 backend
 
 But from the web site1 if I clic a link to go to site2 in won't work
 
 instance # 1 and #2 share physical server but with diffrent cookie 
 because they use different backend some are apache other are tomcat.
 
 What I'm thinking is that if I open a broswer to go directly to site1 or 
 2 all is fine since I have no cookie
 
 But If I click to site2 from within site1 well, I probably already have 
 cookie for site1 in the request and I end with a FILE not found.

No because if your sites have different names, the browser takes extreme
care not to send the cookie to the wrong one. This is a big security
concern before anything else.

If your sites are in fact sub-directories of the same host name, you'd
probably prefer to use different cookie names then, so that the browser
can learn them separately.

Or maybe there is something special in your setup that I did not get ?

Regards,
Willy




Re: Troubleshooting response times

2010-10-28 Thread Willy Tarreau
Hi Guy,

On Wed, Oct 27, 2010 at 12:49:16PM -0700, g...@desgames.com wrote:
 Hi all,
 
 We're trying to narrow down the source of delays we're seeing in
 response times from our web cluster. Using firebug, we're seeing that
 scripts are taking around 10 - 50 ms to complete (we're returning that
 in the response data), but the total response time shown by firebug is
 anywhere between 100ms all the way up to, in some cases, a couple of
 seconds. This also seems to have increased in the recent past.

if you observe randomly spread response times with a background noise
looking like stairs at multiple seconds (generally 3 secs), most of the
time this is caused by TCP retransmits due to losses anywhere between
a client and a server. If your logs report long connect times between
haproxy and your servers, then you can spot an issue in your infra. If
you are lucky to see long request times (those are rare), sometimes it
indicates that a client is having difficulties sending a request after
the connection is accepted.

If you want to check how your server's response times are seen from
haproxy, then halog (in the contrib subdir) can help you. Use it with
-pct to get a percentile of connect and response times. And the newly
released 1.4.9 adds features to report response times by URL in halog.

Most of the time, the log files are the starting point, so that you
can find where to search and where not to search.

Regards,
Willy




Re: [ANNOUNCE] haproxy 1.4.9

2010-10-28 Thread Carlo Flores
Thanks, Willy!

I am especially excited about the new per-URL statistics, super especially
for the average time metric.  However, I can't use these flags with my build
of 1.4.9 from source.

# /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uaoHA-Proxy version 1.4.9
2010/10/28
Copyright 2000-2010 Willy Tarreau w...@1wt.eu

Usage : /usr/local/sbin/haproxy [-f cfgfile]* [ -vdVD ] [ -n maxconn ] [
-N maxpconn ]
[ -p pidfile ] [ -m max megs ]
-v displays version ; -vv shows known build options.
-d enters debug mode ; -db only disables background mode.
-V enters verbose mode (disables quiet mode)
-D goes daemon
-q quiet mode : don't display messages
-c check mode : only check config files and exit
-n sets the maximum total # of connections (2000)
-m limits the usable amount of memory (in MB)
-N sets the default, per-proxy maximum # of connections (2000)
-p writes pids of all children to this file
-de disables epoll() usage even when available
-ds disables speculative epoll() usage even when available
-dp disables poll() usage even when available
-sf/-st [pid ]* finishes/terminates old pids. Must be last
arguments.

# ### same with only -u and only -uc as sanity tests.
# # haproxy -vvv HA-Proxy version 1.4.9 2010/10/28 Copyright
2000-2010 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU =
generic CC = gcc CFLAGS = -m32 -march=i386 -O2 -g OPTIONS = USE_PCRE=1
Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192,
maxpollevents = 200 Encrypted password support via crypt(3): yes Available
polling systems : sepoll : pref=400, test result OK epoll : pref=300, test
result OK poll : pref=200, test result OK select : pref=150, test result OK
Total: 4 (4 usable), will use sepoll.

# ### Thanks! Sorry if I'm just missing it! :)





On Thu, Oct 28, 2010 at 3:40 PM, Willy Tarreau w...@1wt.eu wrote:

 The new feature of halog is a per-URL statistics (req  error counts, avg
 response time, total response time, and that for all or valid only
 requests).
 The output is sorted by a field specified from the command line flag, among
 which URL (-u), req count (-uc), err count (-ue), total time (-ut), average
 time (-ua), total time on OK reqs (-uto) and avg time on OK reqs (-uao).



Re: [ANNOUNCE] haproxy 1.4.9

2010-10-28 Thread Carlo Flores
... sorry about those broken new lines.   Here's the gist:
http://gist.github.com/652558

On Thu, Oct 28, 2010 at 4:27 PM, Carlo Flores ca...@petalphile.com wrote:

 Thanks, Willy!

 I am especially excited about the new per-URL statistics, super especially
 for the average time metric.  However, I can't use these flags with my build
 of 1.4.9 from source.

 # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uaoHA-Proxy version
 1.4.9 2010/10/28
 Copyright 2000-2010 Willy Tarreau w...@1wt.eu

 Usage : /usr/local/sbin/haproxy [-f cfgfile]* [ -vdVD ] [ -n maxconn ]
 [ -N maxpconn ]
 [ -p pidfile ] [ -m max megs ]
 -v displays version ; -vv shows known build options.
 -d enters debug mode ; -db only disables background mode.
 -V enters verbose mode (disables quiet mode)
 -D goes daemon
 -q quiet mode : don't display messages
 -c check mode : only check config files and exit
 -n sets the maximum total # of connections (2000)
 -m limits the usable amount of memory (in MB)
 -N sets the default, per-proxy maximum # of connections (2000)
 -p writes pids of all children to this file
 -de disables epoll() usage even when available
 -ds disables speculative epoll() usage even when available
 -dp disables poll() usage even when available
 -sf/-st [pid ]* finishes/terminates old pids. Must be last
 arguments.

 # ### same with only -u and only -uc as sanity tests.
 # # haproxy -vvv HA-Proxy version 1.4.9 2010/10/28 Copyright
 2000-2010 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU =
 generic CC = gcc CFLAGS = -m32 -march=i386 -O2 -g OPTIONS = USE_PCRE=1
 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192,
 maxpollevents = 200 Encrypted password support via crypt(3): yes Available
 polling systems : sepoll : pref=400, test result OK epoll : pref=300, test
 result OK poll : pref=200, test result OK select : pref=150, test result OK
 Total: 4 (4 usable), will use sepoll.

 # ### Thanks! Sorry if I'm just missing it! :)





 On Thu, Oct 28, 2010 at 3:40 PM, Willy Tarreau w...@1wt.eu wrote:

  The new feature of halog is a per-URL statistics (req  error counts, avg
 response time, total response time, and that for all or valid only
 requests).
 The output is sorted by a field specified from the command line flag,
 among
 which URL (-u), req count (-uc), err count (-ue), total time (-ut),
 average
 time (-ua), total time on OK reqs (-uto) and avg time on OK reqs (-uao).






Re: [ANNOUNCE] haproxy 1.4.9

2010-10-28 Thread Cyril Bonté
Le vendredi 29 octobre 2010 01:27:30, Carlo Flores a écrit :
 I am especially excited about the new per-URL statistics, super especially
 for the average time metric.  However, I can't use these flags with my
 build of 1.4.9 from source.
 
 # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...)

Those options are not for haproxy itself but for halog (see the directory 
contrib/halog in the sources archive) ;-)

-- 
Cyril Bonté



Re: [ANNOUNCE] haproxy 1.4.9

2010-10-28 Thread Carlo Flores
D'oh!  Thank you, Cyril!

On Thu, Oct 28, 2010 at 4:45 PM, Cyril Bonté cyril.bo...@free.fr wrote:

 Le vendredi 29 octobre 2010 01:27:30, Carlo Flores a écrit :
  I am especially excited about the new per-URL statistics, super
 especially
  for the average time metric.  However, I can't use these flags with my
  build of 1.4.9 from source.
 
  # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...)

 Those options are not for haproxy itself but for halog (see the directory
 contrib/halog in the sources archive) ;-)

 --
 Cyril Bonté



Re: [ANNOUNCE] haproxy 1.4.9

2010-10-28 Thread Willy Tarreau
On Thu, Oct 28, 2010 at 04:50:13PM -0700, Carlo Flores wrote:
 D'oh!  Thank you, Cyril!

(...)
   # /usr/local/sbin/haproxy -u -uc -ue -ut -ua -uto -uao (...)

and you should only use one of these -u* at a time, since they all do
the same thing and just change the sorting order !

Willy