Re: CPU saturated with 250Mbps traffic on frontend

2015-04-07 Thread Evgeniy Sudyr
Willy,

I will post results when available.

--
Evgeniy

On Mon, Apr 6, 2015 at 3:24 PM, Willy Tarreau  wrote:
> On Mon, Apr 06, 2015 at 02:54:13PM +0200, Evgeniy Sudyr wrote:
>> this is server with 2x Intel I350-T4 1G Quad port NICs, where on first
>> card each NIC is connected to uplink provider and 2nd NIC 4 ports are
>> used for trunk interface with lacp connected to internal 1Gb switch
>> with lacp configured as well. I've tested uplinks and internal link
>> with iperf and was able to see at least 900Mbps for TCP tests.
>
> You may want to retry without LACP. A long time ago on Linux, the bonding
> driver used not to propagate NIC-specific optimizations and resulted in
> worse performance sometimes than without. Also I don't know if you're
> using VLANs, and I don't know if openbsd supports checksum offloading
> on VLANs, but that could as well be something which limits the list of
> possible optimizations/offloadings that normally result in lower CPU
> usage.
>
>> Card seems to be OK. Haproxy definitely needs to be moved to separate
>> servers in inside network.
>
> Makes sense. Then make sure to use a distro with a kernel 3.10 or above,
> that's where you'll get the best performance.
>
>> Btw, where Pavlos reported his test results? There in list or somewhere else?
>
> It was posted one or two weeks ago on this list, yes. I must say I was
> quite happy to see someone else post results in the order of magnitude
> I encounter in my own tests, because at least I won't be suspected of
> cheating anymore :-)
>
> Cheers,
> Willy
>



-- 
--
With regards,
Eugene Sudyr



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Willy Tarreau
On Mon, Apr 06, 2015 at 02:54:13PM +0200, Evgeniy Sudyr wrote:
> this is server with 2x Intel I350-T4 1G Quad port NICs, where on first
> card each NIC is connected to uplink provider and 2nd NIC 4 ports are
> used for trunk interface with lacp connected to internal 1Gb switch
> with lacp configured as well. I've tested uplinks and internal link
> with iperf and was able to see at least 900Mbps for TCP tests.

You may want to retry without LACP. A long time ago on Linux, the bonding
driver used not to propagate NIC-specific optimizations and resulted in
worse performance sometimes than without. Also I don't know if you're
using VLANs, and I don't know if openbsd supports checksum offloading
on VLANs, but that could as well be something which limits the list of
possible optimizations/offloadings that normally result in lower CPU
usage.

> Card seems to be OK. Haproxy definitely needs to be moved to separate
> servers in inside network.

Makes sense. Then make sure to use a distro with a kernel 3.10 or above,
that's where you'll get the best performance.

> Btw, where Pavlos reported his test results? There in list or somewhere else?

It was posted one or two weeks ago on this list, yes. I must say I was
quite happy to see someone else post results in the order of magnitude
I encounter in my own tests, because at least I won't be suspected of
cheating anymore :-)

Cheers,
Willy




Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Baptiste
On Mon, Apr 6, 2015 at 2:54 PM, Evgeniy Sudyr  wrote:
> Btw, where Pavlos reported his test results? There in list or somewhere else?

On this ML.
Pavlos was running Linux ;)

Baptiste



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Evgeniy Sudyr
this is server with 2x Intel I350-T4 1G Quad port NICs, where on first
card each NIC is connected to uplink provider and 2nd NIC 4 ports are
used for trunk interface with lacp connected to internal 1Gb switch
with lacp configured as well. I've tested uplinks and internal link
with iperf and was able to see at least 900Mbps for TCP tests.

Card seems to be OK. Haproxy definitely needs to be moved to separate
servers in inside network.

Btw, where Pavlos reported his test results? There in list or somewhere else?

Thanks again!

--
Evgeniy


On Mon, Apr 6, 2015 at 12:48 PM, Willy Tarreau  wrote:
> On Mon, Apr 06, 2015 at 12:34:05PM +0200, Evgeniy Sudyr wrote:
>> Hi Willy,
>>
>> pleasure for me to get answer from you!
>>
>> 1) I've tested with OpenBSD's SP kernel and single process (no nbproc)
>> in haproxy.conf and it was no significant difference in load.
>
> OK, I was not sure whether it was the SP kernel or just no nbproc.
>
>> I can't test to disable PF to test, because it's some kind of production 
>> router.
>
> I can understand, the test needs to be run on a test machine.
>
>> 2) I guess solution is to get separated loadbalancing servers with
>> Debian on it and better CPUs and run testing.
>
> I wouldn't give up too fast with openbsd. It's an excellent OS when you
> want a "drop and forget" solution. It's just that it's not very fast. If
> you manage to find what is causing this important load, maybe you can
> work around it or find some tunables.
>>
>> 3) What are "good numbers" - I've tried to find some recent benchmarks
>> for haproxy on commodity hardware, but not much available.
>
> Pavlos recently reported 438000 requests/s. I'm used to see about 110-120k
> end-to-end connections per second on high frequency Xeon CPUs. Bandwidth
> is really cheap these days with the proper NICs : with moderately large
> objects (250kB or more) today it's not hard to reach 40 Gbps on a recent
> machine equipped with one 40G or four 10G NICs.
>
> Just thinking about something since you're reporting 250-300 Mbps, I
> guess you're running on 1 Gbps NICs. Are you using good quality NICs ?
> By good I mean, aren't you running on low-end realteks or similar which
> can require significant work on the driver side and thus explain the
> high CPU usage in interrupt ?
>
> Willy
>



-- 
--
With regards,
Eugene Sudyr



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Willy Tarreau
On Mon, Apr 06, 2015 at 12:34:05PM +0200, Evgeniy Sudyr wrote:
> Hi Willy,
> 
> pleasure for me to get answer from you!
> 
> 1) I've tested with OpenBSD's SP kernel and single process (no nbproc)
> in haproxy.conf and it was no significant difference in load.

OK, I was not sure whether it was the SP kernel or just no nbproc.

> I can't test to disable PF to test, because it's some kind of production 
> router.

I can understand, the test needs to be run on a test machine.

> 2) I guess solution is to get separated loadbalancing servers with
> Debian on it and better CPUs and run testing.

I wouldn't give up too fast with openbsd. It's an excellent OS when you
want a "drop and forget" solution. It's just that it's not very fast. If
you manage to find what is causing this important load, maybe you can
work around it or find some tunables.
> 
> 3) What are "good numbers" - I've tried to find some recent benchmarks
> for haproxy on commodity hardware, but not much available.

Pavlos recently reported 438000 requests/s. I'm used to see about 110-120k
end-to-end connections per second on high frequency Xeon CPUs. Bandwidth
is really cheap these days with the proper NICs : with moderately large
objects (250kB or more) today it's not hard to reach 40 Gbps on a recent
machine equipped with one 40G or four 10G NICs.

Just thinking about something since you're reporting 250-300 Mbps, I
guess you're running on 1 Gbps NICs. Are you using good quality NICs ?
By good I mean, aren't you running on low-end realteks or similar which
can require significant work on the driver side and thus explain the
high CPU usage in interrupt ?

Willy




Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Evgeniy Sudyr
Hi Willy,

pleasure for me to get answer from you!

1) I've tested with OpenBSD's SP kernel and single process (no nbproc)
in haproxy.conf and it was no significant difference in load.

I can't test to disable PF to test, because it's some kind of production router.

2) I guess solution is to get separated loadbalancing servers with
Debian on it and better CPUs and run testing.

3) What are "good numbers" - I've tried to find some recent benchmarks
for haproxy on commodity hardware, but not much available.

--
Evgeniy



--
Evgeniy

On Mon, Apr 6, 2015 at 11:59 AM, Willy Tarreau  wrote:
> Hi Evgeniy,
>
> On Sun, Apr 05, 2015 at 06:29:53PM +0200, Evgeniy Sudyr wrote:
>> Nenad,
>>
>> thank your answer!
>>
>> 1) this is only Haproxy server active (active/passive config exists,
>> but using carp on OpenBSD).
>>
>> 2) As I understand with nbcproc 4 I can't get stats working correctly ...
>>
>> however at the moment I see that for https frontend I have :
>> Current connection rate:58/s
>> Current session rate:53/s
>> Current request rate:124/s
>>
>> For http frontend:
>> Current connection rate:240/s
>> Current session rate:240/s
>> Current request rate:542/s
>
> These numbers are really low.
>
>>
>> 3) current top output (total in/out for HTTP/HTTPs traffic on external
>> interfaces is avg 300 Mbps and this is only Haproxy traffic):
>>
>> load averages:  4.02,  3.92,  3.88
>> router2 19:28:18
>> 32 processes: 1 running, 27 idle, 4 on processor
>> CPU0 states: 12.6% user,  0.0% nice, 11.2% system, 60.9% interrupt, 15.4% 
>> idle
>> CPU1 states: 25.2% user,  0.0% nice, 47.0% system,  0.2% interrupt, 27.6% 
>> idle
>> CPU2 states: 25.1% user,  0.0% nice, 43.3% system,  0.6% interrupt, 30.9% 
>> idle
>> CPU3 states: 21.6% user,  0.0% nice, 48.2% system,  0.2% interrupt, 30.0% 
>> idle
>> Memory: Real: 1017M/1709M act/tot Free: 14G Cache: 111M Swap: 0K/16G
>
> This huge CPU usage in interrupt definitely reminds me of performance issues
> related to pf I used to face a long time ago. The performance would double
> or triple just after issuing "pfctl -d" (to disable it). At least it's easy
> to test. I've never tested openbsd's network stack in SMP yet, it could be
> possible that it comes with some extra cost (for locking or whatever), but
> it might be something else as well.
>
> Regards,
> Willy
>



-- 
--
With regards,
Eugene Sudyr



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-06 Thread Willy Tarreau
Hi Evgeniy,

On Sun, Apr 05, 2015 at 06:29:53PM +0200, Evgeniy Sudyr wrote:
> Nenad,
> 
> thank your answer!
> 
> 1) this is only Haproxy server active (active/passive config exists,
> but using carp on OpenBSD).
> 
> 2) As I understand with nbcproc 4 I can't get stats working correctly ...
> 
> however at the moment I see that for https frontend I have :
> Current connection rate:58/s
> Current session rate:53/s
> Current request rate:124/s
> 
> For http frontend:
> Current connection rate:240/s
> Current session rate:240/s
> Current request rate:542/s

These numbers are really low.

> 
> 3) current top output (total in/out for HTTP/HTTPs traffic on external
> interfaces is avg 300 Mbps and this is only Haproxy traffic):
> 
> load averages:  4.02,  3.92,  3.88
> router2 19:28:18
> 32 processes: 1 running, 27 idle, 4 on processor
> CPU0 states: 12.6% user,  0.0% nice, 11.2% system, 60.9% interrupt, 15.4% idle
> CPU1 states: 25.2% user,  0.0% nice, 47.0% system,  0.2% interrupt, 27.6% idle
> CPU2 states: 25.1% user,  0.0% nice, 43.3% system,  0.6% interrupt, 30.9% idle
> CPU3 states: 21.6% user,  0.0% nice, 48.2% system,  0.2% interrupt, 30.0% idle
> Memory: Real: 1017M/1709M act/tot Free: 14G Cache: 111M Swap: 0K/16G

This huge CPU usage in interrupt definitely reminds me of performance issues
related to pf I used to face a long time ago. The performance would double
or triple just after issuing "pfctl -d" (to disable it). At least it's easy
to test. I've never tested openbsd's network stack in SMP yet, it could be
possible that it comes with some extra cost (for locking or whatever), but
it might be something else as well.

Regards,
Willy




Re: CPU saturated with 250Mbps traffic on frontend

2015-04-05 Thread Evgeniy Sudyr
Nenad,

thank your answer!

1) this is only Haproxy server active (active/passive config exists,
but using carp on OpenBSD).

2) As I understand with nbcproc 4 I can't get stats working correctly ...

however at the moment I see that for https frontend I have :
Current connection rate:58/s
Current session rate:53/s
Current request rate:124/s

For http frontend:
Current connection rate:240/s
Current session rate:240/s
Current request rate:542/s

3) current top output (total in/out for HTTP/HTTPs traffic on external
interfaces is avg 300 Mbps and this is only Haproxy traffic):

load averages:  4.02,  3.92,  3.88
router2 19:28:18
32 processes: 1 running, 27 idle, 4 on processor
CPU0 states: 12.6% user,  0.0% nice, 11.2% system, 60.9% interrupt, 15.4% idle
CPU1 states: 25.2% user,  0.0% nice, 47.0% system,  0.2% interrupt, 27.6% idle
CPU2 states: 25.1% user,  0.0% nice, 43.3% system,  0.6% interrupt, 30.9% idle
CPU3 states: 21.6% user,  0.0% nice, 48.2% system,  0.2% interrupt, 30.0% idle
Memory: Real: 1017M/1709M act/tot Free: 14G Cache: 111M Swap: 0K/16G

  PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND
24285 _haproxy  640  302M  154M onproc-38:00 80.13% haproxy
26781 _haproxy   20  295M  147M run   -33:40 77.98% haproxy
26267 _haproxy  640  297M  149M onproc-35:32 76.86% haproxy
23731 _haproxy  640  291M  143M onproc-31:16 75.98% haproxy

On Sun, Apr 5, 2015 at 5:15 PM, Nenad Merdanovic  wrote:
> Evgeniy,
>
> On 4/5/2015 4:47 PM, Evgeniy Sudyr wrote:
>>
>> Lukas, thank you for pointing to possible keep-alive issues, I've
>> tested it before, but did it again just to make one more check!
>>
>> I've increased keep alives timeout to 10se and removed
>> http-server-close, restarted haproxy :)
>>
>>
>> Changes I've noted - haproxy reduced from AVG 78% to AVG 75% per each CPU
>> core.
>>
>> In top I see avg load is 4.10, before restart it was 4.20
>>
>> Average total bandwidth in frontend interfaces is 250 Mbps and of
>> course number of ESTABLISHED and all connections is much higer now
>> (which is OK, there are plenty of RAM there on this server):
>>
>> [root@router2 ~]#  lsof -ni | grep haproxy | grep ESTABLISHED | grep
>> xxx.xxx.xxx.xxx | wc -l
>>  6568
>> [root@router2 ~]#  lsof -ni | grep haproxy | grep xxx.xxx.xxx.xxx | wc -l
>>  6460
>
>
> Can you please send us the per-core usage (%usr, %sys, %si, ... from top for
> example) of your HAproxy box? How many RPS are you currently doing on the
> SSL frontend? Is this your only HAproxy server handling requests, as having
> something like ECMP towards multiple HAproxy servers would destory the point
> of SSL session cache.
>
>
>>
>> Interesting that interrupts % on CPU0 almost the same at 60%.
>>
>> Not that much CPU load decrease after changing keep alives, looks it's
>> something else.
>>
>>
>> On Sun, Apr 5, 2015 at 2:07 PM, Lukas Tribus  wrote:

 Hi all,

 haproxy is used for http and https load balancing with TLS termination
 on haproxy side.

 I'm using openbsd -stable on this box. I got CPU saturated with
 250Mbps traffic in/out summary on frontend NICs and 3000 ESTABLISHED
 connections on frontent interface to haproxy.
>>>
>>>
>>>
>>> Remove:
>>> option http-server-close
>>> timeout http-keep-alive 1s
>>>
>>>
>>> and replace them with:
>>> option http-keep-alive
>>> option prefer-last-server
>>> timeout http-keep-alive 10s
>>>
>>>
>>>
>>> This will enable keep-alive mode with 10 seconds timeout, that should
>>> decrease the CPU load by an order of magnitude.
>>>
>>> The problem with this SSL/TLS terminating setups is the cost involved
>>> in the SSL/TLS handshake (the actual throughput doesn't really matter).
>>>
>>> Also, I suggest to remove the "no-tls-tickets" option, so that your
>>> clients
>>> can use both SSL sessions and TLS tickets to resume a SSL/TLS session
>>> without starting a full handshake.
>>>
>>>
>>>
>>> Lukas
>>>
>
> Regards,
> Nenad



-- 
--
With regards,
Eugene Sudyr



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-05 Thread Nenad Merdanovic

Evgeniy,

On 4/5/2015 4:47 PM, Evgeniy Sudyr wrote:

Lukas, thank you for pointing to possible keep-alive issues, I've
tested it before, but did it again just to make one more check!

I've increased keep alives timeout to 10se and removed
http-server-close, restarted haproxy :)


Changes I've noted - haproxy reduced from AVG 78% to AVG 75% per each CPU core.

In top I see avg load is 4.10, before restart it was 4.20

Average total bandwidth in frontend interfaces is 250 Mbps and of
course number of ESTABLISHED and all connections is much higer now
(which is OK, there are plenty of RAM there on this server):

[root@router2 ~]#  lsof -ni | grep haproxy | grep ESTABLISHED | grep
xxx.xxx.xxx.xxx | wc -l
 6568
[root@router2 ~]#  lsof -ni | grep haproxy | grep xxx.xxx.xxx.xxx | wc -l
 6460


Can you please send us the per-core usage (%usr, %sys, %si, ... from top 
for example) of your HAproxy box? How many RPS are you currently doing 
on the SSL frontend? Is this your only HAproxy server handling requests, 
as having something like ECMP towards multiple HAproxy servers would 
destory the point of SSL session cache.




Interesting that interrupts % on CPU0 almost the same at 60%.

Not that much CPU load decrease after changing keep alives, looks it's
something else.


On Sun, Apr 5, 2015 at 2:07 PM, Lukas Tribus  wrote:

Hi all,

haproxy is used for http and https load balancing with TLS termination
on haproxy side.

I'm using openbsd -stable on this box. I got CPU saturated with
250Mbps traffic in/out summary on frontend NICs and 3000 ESTABLISHED
connections on frontent interface to haproxy.



Remove:
option http-server-close
timeout http-keep-alive 1s


and replace them with:
option http-keep-alive
option prefer-last-server
timeout http-keep-alive 10s



This will enable keep-alive mode with 10 seconds timeout, that should
decrease the CPU load by an order of magnitude.

The problem with this SSL/TLS terminating setups is the cost involved
in the SSL/TLS handshake (the actual throughput doesn't really matter).

Also, I suggest to remove the "no-tls-tickets" option, so that your clients
can use both SSL sessions and TLS tickets to resume a SSL/TLS session
without starting a full handshake.



Lukas



Regards,
Nenad



Re: CPU saturated with 250Mbps traffic on frontend

2015-04-05 Thread Evgeniy Sudyr
Lukas, thank you for pointing to possible keep-alive issues, I've
tested it before, but did it again just to make one more check!

I've increased keep alives timeout to 10se and removed
http-server-close, restarted haproxy :)


Changes I've noted - haproxy reduced from AVG 78% to AVG 75% per each CPU core.

In top I see avg load is 4.10, before restart it was 4.20

Average total bandwidth in frontend interfaces is 250 Mbps and of
course number of ESTABLISHED and all connections is much higer now
(which is OK, there are plenty of RAM there on this server):

[root@router2 ~]#  lsof -ni | grep haproxy | grep ESTABLISHED | grep
xxx.xxx.xxx.xxx | wc -l
6568
[root@router2 ~]#  lsof -ni | grep haproxy | grep xxx.xxx.xxx.xxx | wc -l
6460

Interesting that interrupts % on CPU0 almost the same at 60%.

Not that much CPU load decrease after changing keep alives, looks it's
something else.


On Sun, Apr 5, 2015 at 2:07 PM, Lukas Tribus  wrote:
>> Hi all,
>>
>> haproxy is used for http and https load balancing with TLS termination
>> on haproxy side.
>>
>> I'm using openbsd -stable on this box. I got CPU saturated with
>> 250Mbps traffic in/out summary on frontend NICs and 3000 ESTABLISHED
>> connections on frontent interface to haproxy.
>
>
> Remove:
> option http-server-close
> timeout http-keep-alive 1s
>
>
> and replace them with:
> option http-keep-alive
> option prefer-last-server
> timeout http-keep-alive 10s
>
>
>
> This will enable keep-alive mode with 10 seconds timeout, that should
> decrease the CPU load by an order of magnitude.
>
> The problem with this SSL/TLS terminating setups is the cost involved
> in the SSL/TLS handshake (the actual throughput doesn't really matter).
>
> Also, I suggest to remove the "no-tls-tickets" option, so that your clients
> can use both SSL sessions and TLS tickets to resume a SSL/TLS session
> without starting a full handshake.
>
>
>
> Lukas
>
>



-- 
--
With regards,
Eugene Sudyr



RE: CPU saturated with 250Mbps traffic on frontend

2015-04-05 Thread Lukas Tribus
> Hi all,
>
> haproxy is used for http and https load balancing with TLS termination
> on haproxy side.
>
> I'm using openbsd -stable on this box. I got CPU saturated with
> 250Mbps traffic in/out summary on frontend NICs and 3000 ESTABLISHED
> connections on frontent interface to haproxy.


Remove:
option http-server-close
timeout http-keep-alive 1s


and replace them with:
option http-keep-alive
option prefer-last-server
timeout http-keep-alive 10s



This will enable keep-alive mode with 10 seconds timeout, that should
decrease the CPU load by an order of magnitude.

The problem with this SSL/TLS terminating setups is the cost involved
in the SSL/TLS handshake (the actual throughput doesn't really matter).

Also, I suggest to remove the "no-tls-tickets" option, so that your clients
can use both SSL sessions and TLS tickets to resume a SSL/TLS session
without starting a full handshake.



Lukas