RES: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-23 Thread Fred Pedrisa
 OK. The last point could slightly help in reducing the number of calls to
kqueue and aggregate more events at once. But FreeBSD's kqueue is really
fast so that should not change much. You really need to be able to pin the
processes to certain CPUs, as well as the interrupts. Unfortunately I cannot
be of any help here :-(

But do you believe the CPU pinning will really make all this difference ? I
know how to do it, using pthread, because I am used with it, just a few
lines of code are able to make it.




RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-22 Thread Fred Pedrisa
Hey, Willy.

I've switch to haproxy 1.5 (last one available on the website), but the
results didn't change much.

However, I didn't try to run all the proxies in just one single process, to
check the difference yet.

-Mensagem original-
De: Fred Pedrisa [mailto:fredhp...@hotmail.com] 
Enviada em: terça-feira, 5 de novembro de 2013 13:33
Para: 'Willy Tarreau'
Cc: 'Lukas Tribus'; 'haproxy@formilux.org'
Assunto: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

 OK. Do you know if you have a single or multiple interrupts on your NICs,
and if they're delivered to a single core, multiple cores, or floating
around more or less randomly ?

This is managed by FreeBSD, it currently have multiple queues and irq
balance with msix.

 It seems that your numbers below tend to confirm this model.

 I still don't know why you have that high a context switch rate. Are you
running with more processes than CPUs ? Also it looks like the system is
mostly spending its time idling. Is it that haproxy is on the same CPU as
the network's interrupts ? Then maybe it could make sense to start multiple
processes and pin them to specific CPU cores, and do the same with the
interrupts. Delivering 500-bytes large messages between two NICs via
userspace experiences a high overhead and everything which could be saved
must be saved (including CPU cache misses).

Yes, if we have 40 processes running and 16 physical cores, I suppose this
is more than the number of physical cores available right ?

However, in FreeBSD we can't do that IRQ Assigning, like we can on linux.
(As far I know).

 We are speaking about 100Kpps (input) and 140Kpps (output)
'approximately'.

 OK, so probably about 30k msg/s in each direction with their respective
ACKs.
 That just makes me think it could possibly do better since we can do
better with HTTP messages.

 Do you have enough concurrent connections to fill the wire and ensure
that the system never waits for either a client or a server ? I'm assuming
that OK given the values assigned to the file descriptors in your latest
email, which were up to 1428. With such numbers and that small messages, it
can make  sense to use multiple processes if that's not the case yet.

In theory yes, the connections are quick, because they are pure tcp
applications and in other cases, http websites, but behind the pure tcp mode
instead of http mode (not in all cases tho).

Fred




Re: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-05 Thread Willy Tarreau
Hello Fred,

[ first, please avoid top-posting, this is very cumbersome for replying
  in context afterwards, and tends to pollute subscribers mailboxes with
  overly large emails ]

  Also, can you confirm that this is a real machine and that we're not
  troubleshooting a VM ?

 Yes, this is a 'real machine', running FreeBSD 9 x64.
 
 It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32
 threads).

OK. Do you know if you have a single or multiple interrupts on your NICs,
and if they're delivered to a single core, multiple cores, or floating
around more or less randomly ?

  That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say
  500 bytes per message, this turns into 80k messages per second, which
  require :
- 2 recvfrom()
- 1 getsockopt()  (we can remove this one, 1.5 doesn't have it)
- 1 sendto()
  
  So 4 syscalls per message, resulting in 320k syscalls per second. It can
  start to represent some CPU usage. But there's more. Such small messages are
  transferred using TCP_NODELAY meaning that a TCP PUSH is set on each
  outgoing packet and that each of them is immediately ACKed. So you get
  80kpps per side in each direction, resulting in 320kpps as well. If you have
  a firewall running on the system, it might take its share of load as well,
  which is possibly attributed to the sending process on outgoing messages.
 
  That said, even with that in mind, I still consider that the system load is
  high for the workload. Could you please share the output of vmstat 1
  (just take the first 10 lines) ?

 Here is the vmstat 1 result :
 
 procs  memory  pagedisks faults cpu
  r b w avmfre   flt  re  pi  pofr  sr da0 pa0   in   sy   cs us 
 sy id
  7 0 0   4818M35G   643   0   0   0   714   0   0   0 4977 1364 5996  8 
 25 67
  3 0 0   4818M35G   224   0   0   0   174   0   0   0 42698 355001 170303 
  8 22 71
  3 0 0   4818M35G   177   0   0   0   174   0   0   0 28715 383061 138108 
  7 23 69
  4 0 0   4818M35G   173   0   0   0   174   0   0   0 28342 375281 138067 
  8 24 69
  5 0 0   4818M35G   185   0   0   0   174   0   0   0 32900 372294 148576 
  7 21 71
  5 0 0   4818M35G   372   0   0   0   174   0   0   0 29112 364030 138826 
  7 25 68

It seems that your numbers below tend to confirm this model.

I still don't know why you have that high a context switch rate. Are you
running with more processes than CPUs ? Also it looks like the system is
mostly spending its time idling. Is it that haproxy is on the same CPU as
the network's interrupts ? Then maybe it could make sense to start multiple
processes and pin them to specific CPU cores, and do the same with the
interrupts. Delivering 500-bytes large messages between two NICs via
userspace experiences a high overhead and everything which could be saved
must be saved (including CPU cache misses).

 We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'.

OK, so probably about 30k msg/s in each direction with their respective ACKs.
That just makes me think it could possibly do better since we can do better
with HTTP messages.

Do you have enough concurrent connections to fill the wire and ensure that
the system never waits for either a client or a server ? I'm assuming that
OK given the values assigned to the file descriptors in your latest email,
which were up to 1428. With such numbers and that small messages, it can
make sense to use multiple processes if that's not the case yet.

Best regards,
Willy




Re: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-05 Thread Jonathan Matthews
On 5 November 2013 11:16, Willy Tarreau w...@1wt.eu wrote:
 It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32
 threads).

 OK. Do you know if you have a single or multiple interrupts on your NICs,
 and if they're delivered to a single core, multiple cores, or floating
 around more or less randomly ?
[snip]

 I still don't know why you have that high a context switch rate. Are you
 running with more processes than CPUs ?

Fred is running with at least 30 separate haproxy processes (as per
his top output in message-id
col129-ds31e074947100ad71da09cb0...@phx.gbl) and 16 real (32 H/T)
cores.

I haven't seen a mail in this thread where Fred's shown that his
problems persist after moving to a single haproxy instance.

/wood-for-the-trees :-)

Jonathan



RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-05 Thread Fred Pedrisa
 OK. Do you know if you have a single or multiple interrupts on your NICs,
and if they're delivered to a single core, multiple cores, or floating
around more or less randomly ?

This is managed by FreeBSD, it currently have multiple queues and irq
balance with msix.

 It seems that your numbers below tend to confirm this model.

 I still don't know why you have that high a context switch rate. Are you
running with more processes than CPUs ? Also it looks like the system is
mostly spending its time idling. Is it that haproxy is on the same CPU as
the network's interrupts ? Then maybe it could make sense to start multiple
processes and pin them to specific CPU cores, and do the same with the
interrupts. Delivering 500-bytes large messages between two NICs via
userspace experiences a high overhead and everything which could be saved
must be saved (including CPU cache misses).

Yes, if we have 40 processes running and 16 physical cores, I suppose this
is more than the number of physical cores available right ?

However, in FreeBSD we can't do that IRQ Assigning, like we can on linux.
(As far I know).

 We are speaking about 100Kpps (input) and 140Kpps (output)
'approximately'.

 OK, so probably about 30k msg/s in each direction with their respective
ACKs.
 That just makes me think it could possibly do better since we can do
better with HTTP messages.

 Do you have enough concurrent connections to fill the wire and ensure
that the system never waits for either a client or a server ? I'm assuming
that OK given the values assigned to the file descriptors in your latest
email, which were up to 1428. With such numbers and that small messages, it
can make  sense to use multiple processes if that's not the case yet.

In theory yes, the connections are quick, because they are pure tcp
applications and in other cases, http websites, but behind the pure tcp mode
instead of http mode (not in all cases tho).

Fred




Re: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-11-05 Thread Dmitry Sivachenko
On 05 нояб. 2013 г., at 19:33, Fred Pedrisa fredhp...@hotmail.com wrote:

 
 However, in FreeBSD we can't do that IRQ Assigning, like we can on linux.
 (As far I know).
 


JFYI: you can assign IRQs to CPUs via cpuset -x irq
(I can’t tell you if it is “like on linux” or not though).




RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-28 Thread Fred Pedrisa
Hello, Willy.

Is there any alternative to strace ? I am on FreeBSD x64 right now.

-Mensagem original-
De: Willy Tarreau [mailto:w...@1wt.eu] 
Enviada em: segunda-feira, 28 de outubro de 2013 03:37
Para: Fred Pedrisa
Cc: 'Lukas Tribus'; haproxy@formilux.org
Assunto: Re: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

Hi Fred,

On Mon, Oct 21, 2013 at 08:41:16PM -0200, Fred Pedrisa wrote:
 Hello,
 
 Ok.
 
 This is the top output :
 
 2748 root1  870 30396K 21656K CPU88  28.0H 49.17% haproxy
  2726 root1  450 38588K 32128K CPU24  16  21.1H 33.79% haproxy
  2718 root1  390 26300K 17464K kqread 28 807:21 29.98% haproxy
  2752 root1  380 30396K 21748K kqread 30 859:13 25.39% haproxy
  2738 root1  320 22204K 14896K kqread 11 796:36 20.65% haproxy
  2740 root1  310 34492K 27404K kqread 10 451:19 18.26% haproxy
  2780 root1  310 18108K  9416K kqread 31 568:38 17.77% haproxy
  2732 root1  290 34492K 27840K kqread  9 405:50 16.16% haproxy
  2730 root1  280 18108K 10868K kqread 15 463:21 15.38% haproxy
  2764 root1  290 18108K 10752K CPU15  15 441:34 14.60% haproxy
  2760 root1  270 18108K 11620K kqread 30 353:48 12.89% haproxy
  2778 root1  260 14012K  8360K kqread 29 407:07 12.16% haproxy
  2756 root1  260 34492K 26280K kqread  8 502:13  9.57% haproxy
  2746 root1  260 22204K 13036K kqread 29 350:32  9.57% haproxy
 47408 root1  250   158M   103M kqread 11 434:37  9.08% haproxy
  2734 root1  230 22204K 13704K kqread 15 384:14  6.69% haproxy
  2722 root1  230 14012K  5052K kqread 10 203:38  6.30% haproxy
  2782 root1  220 14012K  6352K kqread 13 208:07  4.98% haproxy
  2744 root1  210 18108K 12496K kqread 28 170:59  3.27% haproxy
  2758 root1  210 14012K  8320K kqread 29  71:17  2.69% haproxy
  2768 root1  200 14012K  5700K kqread 28  53:16  1.46% haproxy
  2766 root1  210 14012K  4868K kqread 21  88:39  1.27% haproxy
  2724 root1  200 14012K  7136K kqread 14  89:32  1.17% haproxy
  2728 root1  200 14012K  6520K kqread 30  65:21  1.17% haproxy
  2716 root1  200 14012K  5216K kqread 28  67:38  0.98% haproxy
  2762 root1  200  9916K  3936K kqread 30  39:16  0.68% haproxy
  2720 root1  200 14012K  8564K kqread 23 104:37  0.39% haproxy
  2754 root1  200 14012K  6312K kqread 22  80:37  0.39% haproxy
  2736 root1  200 14012K  5884K kqread 25  59:06  0.20% haproxy
  2772 root1  200 14012K  6984K kqread 10  73:54  0.10% haproxy
  2770 root1  200 34492K 25516K kqread 31 111:38  0.00% haproxy
 
 Right now, the load is around 12.45, sometimes going up to 16.00 +/-

I suspect something different. What type of protocol are you relaying ?
Very often, people working in pure TCP mode transfer a lot of very small
packets. And if you have 300 Mbps with many smal packets, it can mean a lot
of wakeups/sleep cycles with a very high syscall rate.

You could check using strace -c on one of the highly loaded processes :

   strace -c -p 1248

Type Ctrl-C after one second, and check the numbers. I'd bet that you'll see
a lot of send/recv calls.

How is the user vs system CPU usage ? If you're seeing a lot of user time,
you may want to give a try to 1.5-dev19, it avoids calling process_session()
as much as possible, saving a lot of CPU cycles in user space. If your CPU
usage is mostly system, then it means that only tuning the system will help.

Regards,
Willy





RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-28 Thread Fred Pedrisa
Hello, Willy.

As you said, take a look :

getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 926
(0x39e)
recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 988
(0x3dc)
recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'
getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 988
(0x3dc)
recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) = 444
(0x1bc)
recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0)  ERR#35 'Resource
temporarily unavailable'
getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 444
(0x1bc)
recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 2627
(0xa43)
recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'
getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 2627
(0xa43)
recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1226
(0x4ca)
recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'
getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 1226
(0x4ca)
recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 982
(0x3d6)
recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'
getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 982
(0x3d6)
recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) =
1205 (0x4b5)
recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0)  ERR#35 'Resource
temporarily unavailable'
getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) = 1205
(0x4b5)
recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) = 68
(0x44)
recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0)  ERR#35 'Resource
temporarily unavailable'
getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(1340,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68
(0x44)
recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 108
(0x6c)
recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'
getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 108
(0x6c)
recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) = 643
(0x283)
recvfrom(166,0x800f13303,7387,0x0,0x0,0x0)   ERR#35 'Resource
temporarily unavailable'

So yes, a lot of recv/send calls as you said before.

-Mensagem original-
De: Willy Tarreau [mailto:w...@1wt.eu] 
Enviada em: segunda-feira, 28 de outubro de 2013 03:37
Para: Fred Pedrisa
Cc: 'Lukas Tribus'; haproxy@formilux.org
Assunto: Re: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

Hi Fred,

On Mon, Oct 21, 2013 at 08:41:16PM -0200, Fred Pedrisa wrote:
 Hello,
 
 Ok.
 
 This is the top output :
 
 2748 root1  870 30396K 21656K CPU88  28.0H 49.17% haproxy
  2726 root1  450 38588K 32128K CPU24  16  21.1H 33.79% haproxy
  2718 root1  390 26300K 17464K kqread 28 807:21 29.98% haproxy
  2752 root1  380 30396K 21748K kqread 30 859:13 25.39% haproxy
  2738 root1  320 22204K 14896K kqread 11 796:36 20.65% haproxy
  2740 root1  310 34492K 27404K kqread 10 451:19 18.26% haproxy
  2780 root1  310 18108K  9416K kqread 31 568:38 17.77% haproxy
  2732 root1  290 34492K 27840K kqread  9 405:50 16.16% haproxy
  2730 root1  280 18108K 10868K kqread 15 463:21 15.38% haproxy
  2764 root1  290 18108K 10752K CPU15  15 441:34 14.60% haproxy
  2760 root1  270 18108K 11620K kqread 30 353:48 12.89% haproxy
  2778 root1  260 14012K  8360K kqread 29 407:07 12.16% haproxy
  2756 root1  260 34492K 26280K kqread  8 502:13  9.57% haproxy
  2746 root1  260 22204K 13036K kqread 29 350:32  9.57% haproxy
 47408 root1  250   158M   103M kqread 11 434:37  9.08% haproxy
  2734 root1  230 22204K 13704K kqread 15 384:14  6.69% haproxy
  2722 root1  230 14012K  5052K kqread 10 203:38  6.30% haproxy
  2782 root1  220 14012K  6352K kqread 13 208:07  4.98% haproxy
  2744 root1  210 18108K 12496K kqread 28 170:59  3.27% haproxy
  2758 root1  210 14012K  8320K kqread 29  71:17  2.69% haproxy
  2768 root1  200 14012K  5700K kqread 28  53

Re: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-28 Thread Willy Tarreau
Hello Fred,

On Mon, Oct 28, 2013 at 10:02:15AM -0200, Fred Pedrisa wrote:
 Hello, Willy.
 
 As you said, take a look :
 
 getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 926
 (0x39e)
 recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 988
 (0x3dc)
 recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 988
 (0x3dc)
 recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) = 444
 (0x1bc)
 recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 444
 (0x1bc)
 recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 2627
 (0xa43)
 recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 2627
 (0xa43)
 recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1226
 (0x4ca)
 recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 1226
 (0x4ca)
 recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 982
 (0x3d6)
 recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 982
 (0x3d6)
 recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) =
 1205 (0x4b5)
 recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) = 1205
 (0x4b5)
 recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) = 68
 (0x44)
 recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(1340,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68
 (0x44)
 recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 108
 (0x6c)
 recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0)
 sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 108
 (0x6c)
 recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) = 643
 (0x283)
 recvfrom(166,0x800f13303,7387,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 
 So yes, a lot of recv/send calls as you said before.

Yes but they're not all that small. The average size looks like .5 or 1kB.
That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say
500 bytes per message, this turns into 80k messages per second, which
require :
  - 2 recvfrom()
  - 1 getsockopt()  (we can remove this one, 1.5 doesn't have it)
  - 1 sendto()

So 4 syscalls per message, resulting in 320k syscalls per second. It can
start to represent some CPU usage. But there's more. Such small messages
are transferred using TCP_NODELAY meaning that a TCP PUSH is set on each
outgoing packet and that each of them is immediately ACKed. So you get
80kpps per side in each direction, resulting in 320kpps as well. If you
have a firewall running on the system, it might take its share of load
as well, which is possibly attributed to the sending process on outgoing
messages.

That said, even with that in mind, I still consider that the system load
is high for the workload. Could you please share the output of vmstat 1
(just take the first 10 lines) ? Also, can you confirm that this is a real
machine and that we're not troubleshooting a VM ?

It could make sense to try 1.5 (latest snapshot) for maybe the highest
loaded process only if that makes the test easier and check if its CPU
load drops or not.

Best regards,
Willy




RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-28 Thread Fred Pedrisa
Hello, Willy.

Yes, this is a 'real machine', running FreeBSD 9 x64.

It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32
threads).

We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'.

Here is the vmstat 1 result :

procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr da0 pa0   in   sy   cs us
sy id
 7 0 0   4818M35G   643   0   0   0   714   0   0   0 4977 1364 5996  8
25 67
 3 0 0   4818M35G   224   0   0   0   174   0   0   0 42698 355001
170303  8 22 71
 3 0 0   4818M35G   177   0   0   0   174   0   0   0 28715 383061
138108  7 23 69
 4 0 0   4818M35G   173   0   0   0   174   0   0   0 28342 375281
138067  8 24 69
 5 0 0   4818M35G   185   0   0   0   174   0   0   0 32900 372294
148576  7 21 71
 5 0 0   4818M35G   372   0   0   0   174   0   0   0 29112 364030
138826  7 25 68
 4 0 0   4818M35G   159   0   0   0   174   0   0   0 34102 368835
150530  9 22 70
 4 0 0   4818M35G   362   0   0   0   174   0   0   0 39928 366139
165853  8 21 71
 3 0 0   4818M35G   220   0   0   0   174   0   0   0 39195 371933
163533  8 21 71
 6 0 0   4818M35G   262   0   0   0   174   0   0   0 42681 354697
172687  8 21 71

-Mensagem original-
De: Willy Tarreau [mailto:w...@1wt.eu] 
Enviada em: segunda-feira, 28 de outubro de 2013 20:58
Para: Fred Pedrisa
Cc: 'Lukas Tribus'; haproxy@formilux.org
Assunto: Re: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

Hello Fred,

On Mon, Oct 28, 2013 at 10:02:15AM -0200, Fred Pedrisa wrote:
 Hello, Willy.
 
 As you said, take a look :
 
 getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 
 926
 (0x39e)
 recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) 
 = 988
 (0x3dc)
 recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 
 988
 (0x3dc)
 recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) 
 = 444
 (0x1bc)
 recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 
 444
 (0x1bc)
 recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 
 2627
 (0xa43)
 recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 
 2627
 (0xa43)
 recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) 
 = 1226
 (0x4ca)
 recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 
 1226
 (0x4ca)
 recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 
 982
 (0x3d6)
 recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 
 982
 (0x3d6)
 recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) 
 =
 1205 (0x4b5)
 recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) 
 = 1205
 (0x4b5)
 recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) 
 = 68
 (0x44)
 recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0)  ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0) sendto(1340,\v\0tpDa\^A\^DV 
 \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68
 (0x44)
 recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 
 108
 (0x6c)
 recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 
 (0x0)
 sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 
 108
 (0x6c)
 recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) 
 = 643
 (0x283)
 recvfrom(166,0x800f13303,7387,0x0,0x0,0x0)   ERR#35 'Resource
 temporarily unavailable'
 
 So yes, a lot of recv/send calls as you said before.

Yes but they're not all that small. The average size looks like .5 or 1kB.
That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say
500 bytes per message, this turns into 80k messages per

RES: RES: RES: High CPU Usage (HaProxy)

2013-10-21 Thread Fred Pedrisa
Hello,

Yes, this is why I was speaking with Jeff about this.

Because I suppose that these processes have a default loop, that uses a
certain amount of CPU (kQueue implementation)

Example config :

global
log 127.0.0.1   local0
log 127.0.0.1   local1 notice
maxconn 16384
daemon

defaults
log global
modehttp
option  dontlognull
option  redispatch
retries 3
maxconn 16384
contimeout  5000
clitimeout  5
srvtimeout  5

listen stats 185.30.164.40:1
balance
mode http
stats enable

listen port_link_1
mode tcp
option tcplog
option nolinger
bind X.X.X.X:1433
bind X.X.X.X:3500
bind X.X.X.X:3800
bind X.X.X.X:
server link Y.Y.Y.Y
source X.X.X.X

Output of -vv :

HA-Proxy version 1.4.23 2013/04/03
Copyright 2000-2013 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = openbsd
  CPU = generic
  CC  = gcc

Default settings :
  maxconn = 1024, bufsize = 8030, maxrewrite = 1030, maxpollevents = 200

Encrypted password support via crypt(3): no

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

-Mensagem original-
De: Lukas Tribus [mailto:luky...@hotmail.com] 
Enviada em: segunda-feira, 21 de outubro de 2013 19:40
Para: Fred Pedrisa; haproxy@formilux.org
Assunto: RE: RES: RES: High CPU Usage (HaProxy)

Hi Fred,

 I am using a 10 Gbps Intel 520-DA2 NIC.

 The cpu usage in top vary per process we have something like :

 Haproxy - 93%
 Haproxy - 85%
 Haproxy - 50%
 Haproxy - 43%
 Haproxy - 32%
 Haproxy - 20%
 Haproxy - 15%
 Haproxy - 5%
 Haproxy - 1%

 About 30-40 Processes.

 I am just using it as a tcp proxy, basic functionality, no load 
 balancing, no status checking or http mode at all.

 Just a simple backend :

 User - Haproxy - Destination.

Thats definitely not normal with that kind of traffic.

Please post the output of haproxy -vv and please do shows us one of those
haproxy configurations (with high cpu), even if they are seemingly simple.

I would suggest to use a single haproxy instance (just one process).



Regards,

Lukas 




RE: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-21 Thread Lukas Tribus
Hi,


 Yes, this is why I was speaking with Jeff about this.

 Because I suppose that these processes have a default loop, that uses a
 certain amount of CPU (kQueue implementation)

Its not busy polling, if thats what you are referring to. CPU usage should
be low with kqueue (because its fully event based).

I think that you may face some scheduling issues (like context switching),
because of the amount of haproxy instances you are running.

I would really suggest to run haproxy in a single instance and process.



 Would this cause a port conflict or anything like this ?
 
 Or when you use -sf, it automatically 'unbind' the port on the old process,
 allowing it only for the new one ?

HAproxy will take care of this, no conflicts are expected.



 HA-Proxy version 1.4.23 2013/04/03

1.4.23 is not the latest, 1.4.24 is. Important bugfixes are in 1.4.24, though
none matches your symptoms.



Regards,

Lukas 


RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-21 Thread Fred Pedrisa
Hi,

Yes, the current version (for my usage) is really stable.

However, you are right, because too many processes, will create too many
threads, assuming I have just 16 Physical Cores...

Do you believe on a good CPU usage decrease, by switching to one process
only ?

-Mensagem original-
De: Lukas Tribus [mailto:luky...@hotmail.com] 
Enviada em: segunda-feira, 21 de outubro de 2013 20:08
Para: Fred Pedrisa; haproxy@formilux.org
Assunto: RE: RES: RES: RES: High CPU Usage (HaProxy)

Hi,


 Yes, this is why I was speaking with Jeff about this.

 Because I suppose that these processes have a default loop, that uses 
 a certain amount of CPU (kQueue implementation)

Its not busy polling, if thats what you are referring to. CPU usage should
be low with kqueue (because its fully event based).

I think that you may face some scheduling issues (like context switching),
because of the amount of haproxy instances you are running.

I would really suggest to run haproxy in a single instance and process.



 Would this cause a port conflict or anything like this ?
 
 Or when you use -sf, it automatically 'unbind' the port on the old 
 process, allowing it only for the new one ?

HAproxy will take care of this, no conflicts are expected.



 HA-Proxy version 1.4.23 2013/04/03

1.4.23 is not the latest, 1.4.24 is. Important bugfixes are in 1.4.24,
though none matches your symptoms.



Regards,

Lukas 




RE: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-21 Thread Lukas Tribus
Hi,


 Yes, the current version (for my usage) is really stable.

 However, you are right, because too many processes, will create too many
 threads, assuming I have just 16 Physical Cores...

 Do you believe on a good CPU usage decrease, by switching to one process
 only ?

I can't guarantee it, but its definitely a step in the right direction.

The highest performance with the lowest load can be achieved when:
- using a single instance/process
- using kqueue (bsd) or epoll (linux)
- pinning the process to a core
- pinning the system/nic interrupts to another core on the same physical
  processor (so they can share the layer2 cache)


But the load you are seeing is not just suboptimal, its abnormal imo, so you
are not looking for a little performance tweaking here and there, but to fix
what is a major performance/load problem.

I believe there is a good chance that the performance is that bad because of
scheduling/context switching problems, caused by 1 haproxy process.



Regards,

Lukas 


RES: RES: RES: RES: RES: High CPU Usage (HaProxy)

2013-10-21 Thread Fred Pedrisa
Hello,

Ok.

This is the top output :

2748 root1  870 30396K 21656K CPU88  28.0H 49.17% haproxy
 2726 root1  450 38588K 32128K CPU24  16  21.1H 33.79% haproxy
 2718 root1  390 26300K 17464K kqread 28 807:21 29.98% haproxy
 2752 root1  380 30396K 21748K kqread 30 859:13 25.39% haproxy
 2738 root1  320 22204K 14896K kqread 11 796:36 20.65% haproxy
 2740 root1  310 34492K 27404K kqread 10 451:19 18.26% haproxy
 2780 root1  310 18108K  9416K kqread 31 568:38 17.77% haproxy
 2732 root1  290 34492K 27840K kqread  9 405:50 16.16% haproxy
 2730 root1  280 18108K 10868K kqread 15 463:21 15.38% haproxy
 2764 root1  290 18108K 10752K CPU15  15 441:34 14.60% haproxy
 2760 root1  270 18108K 11620K kqread 30 353:48 12.89% haproxy
 2778 root1  260 14012K  8360K kqread 29 407:07 12.16% haproxy
 2756 root1  260 34492K 26280K kqread  8 502:13  9.57% haproxy
 2746 root1  260 22204K 13036K kqread 29 350:32  9.57% haproxy
47408 root1  250   158M   103M kqread 11 434:37  9.08% haproxy
 2734 root1  230 22204K 13704K kqread 15 384:14  6.69% haproxy
 2722 root1  230 14012K  5052K kqread 10 203:38  6.30% haproxy
 2782 root1  220 14012K  6352K kqread 13 208:07  4.98% haproxy
 2744 root1  210 18108K 12496K kqread 28 170:59  3.27% haproxy
 2758 root1  210 14012K  8320K kqread 29  71:17  2.69% haproxy
 2768 root1  200 14012K  5700K kqread 28  53:16  1.46% haproxy
 2766 root1  210 14012K  4868K kqread 21  88:39  1.27% haproxy
 2724 root1  200 14012K  7136K kqread 14  89:32  1.17% haproxy
 2728 root1  200 14012K  6520K kqread 30  65:21  1.17% haproxy
 2716 root1  200 14012K  5216K kqread 28  67:38  0.98% haproxy
 2762 root1  200  9916K  3936K kqread 30  39:16  0.68% haproxy
 2720 root1  200 14012K  8564K kqread 23 104:37  0.39% haproxy
 2754 root1  200 14012K  6312K kqread 22  80:37  0.39% haproxy
 2736 root1  200 14012K  5884K kqread 25  59:06  0.20% haproxy
 2772 root1  200 14012K  6984K kqread 10  73:54  0.10% haproxy
 2770 root1  200 34492K 25516K kqread 31 111:38  0.00% haproxy

Right now, the load is around 12.45, sometimes going up to 16.00 +/-

-Mensagem original-
De: Lukas Tribus [mailto:luky...@hotmail.com] 
Enviada em: segunda-feira, 21 de outubro de 2013 20:39
Para: Fred Pedrisa; haproxy@formilux.org
Assunto: RE: RES: RES: RES: RES: High CPU Usage (HaProxy)

Hi,


 Yes, the current version (for my usage) is really stable.

 However, you are right, because too many processes, will create too 
 many threads, assuming I have just 16 Physical Cores...

 Do you believe on a good CPU usage decrease, by switching to one 
 process only ?

I can't guarantee it, but its definitely a step in the right direction.

The highest performance with the lowest load can be achieved when:
- using a single instance/process
- using kqueue (bsd) or epoll (linux)
- pinning the process to a core
- pinning the system/nic interrupts to another core on the same physical
  processor (so they can share the layer2 cache)


But the load you are seeing is not just suboptimal, its abnormal imo, so you
are not looking for a little performance tweaking here and there, but to fix
what is a major performance/load problem.

I believe there is a good chance that the performance is that bad because of
scheduling/context switching problems, caused by 1 haproxy process.



Regards,

Lukas