Hi Dominik, On Fri, Mar 30, 2012 at 03:52:20PM +0200, Mostowiec Dominik wrote: > Hi, > Thanks for the response. > > I have another problem: > > 11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags [S], seq > 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0 > 11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq > 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length > 0 > 11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, > win 913, length 0 > 11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq > 1:151, ack 1, win 913, length 150 > 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, > ack 1, win 913, length 0 > 11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, > win 229, length 0
Did you notice that your request packet (the 4th) was lost on the network ? That's one reason why we always want to set timeouts above 3 sec (generally 4 or 5), so that it covers one TCP retransmit. I guess you captured on the siege_host (you did not have -vv nor -S so some info are missing) ? Also, you shoul be careful with the system config on siege_host, as it does not have SACK enabled, which makes things worse when your network is lossy. This packet loss issue is the reason for the pause you observe since the request never reaches haproxy. If you increase your siege timeout above 3s you'll see that many requests take 3s to be processed due to the retransmit and that other ones still fail. You really need to find what is causing these losses and to fix that, it's impossible to run a benchmark on a lossy network! Check your switches and your NICs. Ensure you're not running with an old bnx2 NIC with an old firmware. BTW I have a few comments about your config : > global > maxconn 163937 What's the reason for this magic number ? > user haproxy > group haproxy > daemon > nbproc 16 Wow 16 procs ! I don't know what you intend to do, but it will generally not bring anything and might even reduce the performance. > defaults > log global > mode http > option httplog > option dontlognull > option forwardfor > retries 1 > contimeout 1s < 3s timeout, see above > clitimeout 33s > srvtimeout 33s > grace 7s grace serves no purpose these days, especially if all instances share the same setting (the goal was to make some instances stop before other ones to fail external health checks). I see that you have no default maxconn, so your frontends will still be limited by the default maxconn (2000). (...) > Haproxy is started with "-n 163937 -N 163937" options. OK so -N sets it. Still strange value anyway. > I attached stats for test when nbproc is set to '1'. Hmmm the load was very low : 691 MB/20k conn = 34kB per connection At peak you reached 34kB*850 sess/s = 29 MB/s ~= 250 Mbps It's very concerning that you're experiencing network losses at this rate. Just a hint, it's more likely that the losses are located on the siege host or between it and the network than on the haproxy host, because when you run haproxy on a lossy machine you generally observe failed health checks, which you didn't have here during the test. > Somthing is wrong with my configuration ? Not particularly, let aside the strange numbers. Regards, Willy