Hi Dominik,

On Fri, Mar 30, 2012 at 03:52:20PM +0200, Mostowiec Dominik wrote:
> Hi,
> Thanks for the response.
> 
> I have another problem:
>
> 11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags [S], seq 
> 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq 
> 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length > 0
> 11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, 
> win 913, length 0
> 11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 
> 1:151, ack 1, win 913, length 150
> 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, 
> ack 1, win 913, length 0
> 11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, 
> win 229, length 0

Did you notice that your request packet (the 4th) was lost on the network ?

That's one reason why we always want to set timeouts above 3 sec (generally
4 or 5), so that it covers one TCP retransmit. I guess you captured on the
siege_host (you did not have -vv nor -S so some info are missing) ? Also,
you shoul be careful with the system config on siege_host, as it does not
have SACK enabled, which makes things worse when your network is lossy.

This packet loss issue is the reason for the pause you observe since the
request never reaches haproxy. If you increase your siege timeout above 3s
you'll see that many requests take 3s to be processed due to the retransmit
and that other ones still fail. You really need to find what is causing
these losses and to fix that, it's impossible to run a benchmark on a lossy
network! Check your switches and your NICs. Ensure you're not running with
an old bnx2 NIC with an old firmware.

BTW I have a few comments about your config :

> global
>     maxconn 163937

What's the reason for this magic number ?

>     user haproxy
>     group haproxy
>     daemon
>     nbproc 16

Wow 16 procs ! I don't know what you intend to do, but it will generally
not bring anything and might even reduce the performance.

> defaults
>     log global
>     mode        http
>     option      httplog
>     option      dontlognull
>     option      forwardfor
>     retries     1
>     contimeout  1s

 < 3s timeout, see above

>     clitimeout  33s
>     srvtimeout  33s
>     grace 7s

grace serves no purpose these days, especially if all instances
share the same setting (the goal was to make some instances stop
before other ones to fail external health checks).

I see that you have no default maxconn, so your frontends will still
be limited by the default maxconn (2000).

(...)
> Haproxy is started with "-n 163937 -N 163937" options.

OK so -N sets it. Still strange value anyway.

> I attached stats for test when nbproc is set to '1'.

Hmmm the load was very low :

   691 MB/20k conn = 34kB per connection
   At peak you reached 34kB*850 sess/s = 29 MB/s ~= 250 Mbps

It's very concerning that you're experiencing network losses at this
rate. Just a hint, it's more likely that the losses are located on
the siege host or between it and the network than on the haproxy
host, because when you run haproxy on a lossy machine you generally
observe failed health checks, which you didn't have here during the
test.

> Somthing is wrong with my configuration ?

Not particularly, let aside the strange numbers.

Regards,
Willy


Reply via email to