RES: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
OK. The last point could slightly help in reducing the number of calls to kqueue and aggregate more events at once. But FreeBSD's kqueue is really fast so that should not change much. You really need to be able to pin the processes to certain CPUs, as well as the interrupts. Unfortunately I cannot be of any help here :-( But do you believe the CPU pinning will really make all this difference ? I know how to do it, using pthread, because I am used with it, just a few lines of code are able to make it.
RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hey, Willy. I've switch to haproxy 1.5 (last one available on the website), but the results didn't change much. However, I didn't try to run all the proxies in just one single process, to check the difference yet. -Mensagem original- De: Fred Pedrisa [mailto:fredhp...@hotmail.com] Enviada em: terça-feira, 5 de novembro de 2013 13:33 Para: 'Willy Tarreau' Cc: 'Lukas Tribus'; 'haproxy@formilux.org' Assunto: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy) OK. Do you know if you have a single or multiple interrupts on your NICs, and if they're delivered to a single core, multiple cores, or floating around more or less randomly ? This is managed by FreeBSD, it currently have multiple queues and irq balance with msix. It seems that your numbers below tend to confirm this model. I still don't know why you have that high a context switch rate. Are you running with more processes than CPUs ? Also it looks like the system is mostly spending its time idling. Is it that haproxy is on the same CPU as the network's interrupts ? Then maybe it could make sense to start multiple processes and pin them to specific CPU cores, and do the same with the interrupts. Delivering 500-bytes large messages between two NICs via userspace experiences a high overhead and everything which could be saved must be saved (including CPU cache misses). Yes, if we have 40 processes running and 16 physical cores, I suppose this is more than the number of physical cores available right ? However, in FreeBSD we can't do that IRQ Assigning, like we can on linux. (As far I know). We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'. OK, so probably about 30k msg/s in each direction with their respective ACKs. That just makes me think it could possibly do better since we can do better with HTTP messages. Do you have enough concurrent connections to fill the wire and ensure that the system never waits for either a client or a server ? I'm assuming that OK given the values assigned to the file descriptors in your latest email, which were up to 1428. With such numbers and that small messages, it can make sense to use multiple processes if that's not the case yet. In theory yes, the connections are quick, because they are pure tcp applications and in other cases, http websites, but behind the pure tcp mode instead of http mode (not in all cases tho). Fred
Re: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello Fred, [ first, please avoid top-posting, this is very cumbersome for replying in context afterwards, and tends to pollute subscribers mailboxes with overly large emails ] Also, can you confirm that this is a real machine and that we're not troubleshooting a VM ? Yes, this is a 'real machine', running FreeBSD 9 x64. It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32 threads). OK. Do you know if you have a single or multiple interrupts on your NICs, and if they're delivered to a single core, multiple cores, or floating around more or less randomly ? That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say 500 bytes per message, this turns into 80k messages per second, which require : - 2 recvfrom() - 1 getsockopt() (we can remove this one, 1.5 doesn't have it) - 1 sendto() So 4 syscalls per message, resulting in 320k syscalls per second. It can start to represent some CPU usage. But there's more. Such small messages are transferred using TCP_NODELAY meaning that a TCP PUSH is set on each outgoing packet and that each of them is immediately ACKed. So you get 80kpps per side in each direction, resulting in 320kpps as well. If you have a firewall running on the system, it might take its share of load as well, which is possibly attributed to the sending process on outgoing messages. That said, even with that in mind, I still consider that the system load is high for the workload. Could you please share the output of vmstat 1 (just take the first 10 lines) ? Here is the vmstat 1 result : procs memory pagedisks faults cpu r b w avmfre flt re pi pofr sr da0 pa0 in sy cs us sy id 7 0 0 4818M35G 643 0 0 0 714 0 0 0 4977 1364 5996 8 25 67 3 0 0 4818M35G 224 0 0 0 174 0 0 0 42698 355001 170303 8 22 71 3 0 0 4818M35G 177 0 0 0 174 0 0 0 28715 383061 138108 7 23 69 4 0 0 4818M35G 173 0 0 0 174 0 0 0 28342 375281 138067 8 24 69 5 0 0 4818M35G 185 0 0 0 174 0 0 0 32900 372294 148576 7 21 71 5 0 0 4818M35G 372 0 0 0 174 0 0 0 29112 364030 138826 7 25 68 It seems that your numbers below tend to confirm this model. I still don't know why you have that high a context switch rate. Are you running with more processes than CPUs ? Also it looks like the system is mostly spending its time idling. Is it that haproxy is on the same CPU as the network's interrupts ? Then maybe it could make sense to start multiple processes and pin them to specific CPU cores, and do the same with the interrupts. Delivering 500-bytes large messages between two NICs via userspace experiences a high overhead and everything which could be saved must be saved (including CPU cache misses). We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'. OK, so probably about 30k msg/s in each direction with their respective ACKs. That just makes me think it could possibly do better since we can do better with HTTP messages. Do you have enough concurrent connections to fill the wire and ensure that the system never waits for either a client or a server ? I'm assuming that OK given the values assigned to the file descriptors in your latest email, which were up to 1428. With such numbers and that small messages, it can make sense to use multiple processes if that's not the case yet. Best regards, Willy
Re: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
On 5 November 2013 11:16, Willy Tarreau w...@1wt.eu wrote: It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32 threads). OK. Do you know if you have a single or multiple interrupts on your NICs, and if they're delivered to a single core, multiple cores, or floating around more or less randomly ? [snip] I still don't know why you have that high a context switch rate. Are you running with more processes than CPUs ? Fred is running with at least 30 separate haproxy processes (as per his top output in message-id col129-ds31e074947100ad71da09cb0...@phx.gbl) and 16 real (32 H/T) cores. I haven't seen a mail in this thread where Fred's shown that his problems persist after moving to a single haproxy instance. /wood-for-the-trees :-) Jonathan
RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
OK. Do you know if you have a single or multiple interrupts on your NICs, and if they're delivered to a single core, multiple cores, or floating around more or less randomly ? This is managed by FreeBSD, it currently have multiple queues and irq balance with msix. It seems that your numbers below tend to confirm this model. I still don't know why you have that high a context switch rate. Are you running with more processes than CPUs ? Also it looks like the system is mostly spending its time idling. Is it that haproxy is on the same CPU as the network's interrupts ? Then maybe it could make sense to start multiple processes and pin them to specific CPU cores, and do the same with the interrupts. Delivering 500-bytes large messages between two NICs via userspace experiences a high overhead and everything which could be saved must be saved (including CPU cache misses). Yes, if we have 40 processes running and 16 physical cores, I suppose this is more than the number of physical cores available right ? However, in FreeBSD we can't do that IRQ Assigning, like we can on linux. (As far I know). We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'. OK, so probably about 30k msg/s in each direction with their respective ACKs. That just makes me think it could possibly do better since we can do better with HTTP messages. Do you have enough concurrent connections to fill the wire and ensure that the system never waits for either a client or a server ? I'm assuming that OK given the values assigned to the file descriptors in your latest email, which were up to 1428. With such numbers and that small messages, it can make sense to use multiple processes if that's not the case yet. In theory yes, the connections are quick, because they are pure tcp applications and in other cases, http websites, but behind the pure tcp mode instead of http mode (not in all cases tho). Fred
Re: RES: RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
On 05 нояб. 2013 г., at 19:33, Fred Pedrisa fredhp...@hotmail.com wrote: However, in FreeBSD we can't do that IRQ Assigning, like we can on linux. (As far I know). JFYI: you can assign IRQs to CPUs via cpuset -x irq (I can’t tell you if it is “like on linux” or not though).
RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello, Willy. Is there any alternative to strace ? I am on FreeBSD x64 right now. -Mensagem original- De: Willy Tarreau [mailto:w...@1wt.eu] Enviada em: segunda-feira, 28 de outubro de 2013 03:37 Para: Fred Pedrisa Cc: 'Lukas Tribus'; haproxy@formilux.org Assunto: Re: RES: RES: RES: RES: RES: High CPU Usage (HaProxy) Hi Fred, On Mon, Oct 21, 2013 at 08:41:16PM -0200, Fred Pedrisa wrote: Hello, Ok. This is the top output : 2748 root1 870 30396K 21656K CPU88 28.0H 49.17% haproxy 2726 root1 450 38588K 32128K CPU24 16 21.1H 33.79% haproxy 2718 root1 390 26300K 17464K kqread 28 807:21 29.98% haproxy 2752 root1 380 30396K 21748K kqread 30 859:13 25.39% haproxy 2738 root1 320 22204K 14896K kqread 11 796:36 20.65% haproxy 2740 root1 310 34492K 27404K kqread 10 451:19 18.26% haproxy 2780 root1 310 18108K 9416K kqread 31 568:38 17.77% haproxy 2732 root1 290 34492K 27840K kqread 9 405:50 16.16% haproxy 2730 root1 280 18108K 10868K kqread 15 463:21 15.38% haproxy 2764 root1 290 18108K 10752K CPU15 15 441:34 14.60% haproxy 2760 root1 270 18108K 11620K kqread 30 353:48 12.89% haproxy 2778 root1 260 14012K 8360K kqread 29 407:07 12.16% haproxy 2756 root1 260 34492K 26280K kqread 8 502:13 9.57% haproxy 2746 root1 260 22204K 13036K kqread 29 350:32 9.57% haproxy 47408 root1 250 158M 103M kqread 11 434:37 9.08% haproxy 2734 root1 230 22204K 13704K kqread 15 384:14 6.69% haproxy 2722 root1 230 14012K 5052K kqread 10 203:38 6.30% haproxy 2782 root1 220 14012K 6352K kqread 13 208:07 4.98% haproxy 2744 root1 210 18108K 12496K kqread 28 170:59 3.27% haproxy 2758 root1 210 14012K 8320K kqread 29 71:17 2.69% haproxy 2768 root1 200 14012K 5700K kqread 28 53:16 1.46% haproxy 2766 root1 210 14012K 4868K kqread 21 88:39 1.27% haproxy 2724 root1 200 14012K 7136K kqread 14 89:32 1.17% haproxy 2728 root1 200 14012K 6520K kqread 30 65:21 1.17% haproxy 2716 root1 200 14012K 5216K kqread 28 67:38 0.98% haproxy 2762 root1 200 9916K 3936K kqread 30 39:16 0.68% haproxy 2720 root1 200 14012K 8564K kqread 23 104:37 0.39% haproxy 2754 root1 200 14012K 6312K kqread 22 80:37 0.39% haproxy 2736 root1 200 14012K 5884K kqread 25 59:06 0.20% haproxy 2772 root1 200 14012K 6984K kqread 10 73:54 0.10% haproxy 2770 root1 200 34492K 25516K kqread 31 111:38 0.00% haproxy Right now, the load is around 12.45, sometimes going up to 16.00 +/- I suspect something different. What type of protocol are you relaying ? Very often, people working in pure TCP mode transfer a lot of very small packets. And if you have 300 Mbps with many smal packets, it can mean a lot of wakeups/sleep cycles with a very high syscall rate. You could check using strace -c on one of the highly loaded processes : strace -c -p 1248 Type Ctrl-C after one second, and check the numbers. I'd bet that you'll see a lot of send/recv calls. How is the user vs system CPU usage ? If you're seeing a lot of user time, you may want to give a try to 1.5-dev19, it avoids calling process_session() as much as possible, saving a lot of CPU cycles in user space. If your CPU usage is mostly system, then it means that only tuning the system will help. Regards, Willy
RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello, Willy. As you said, take a look : getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 926 (0x39e) recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 988 (0x3dc) recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 988 (0x3dc) recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) = 444 (0x1bc) recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 444 (0x1bc) recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 2627 (0xa43) recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 2627 (0xa43) recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1226 (0x4ca) recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 1226 (0x4ca) recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 982 (0x3d6) recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 982 (0x3d6) recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1205 (0x4b5) recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) = 1205 (0x4b5) recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) = 68 (0x44) recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1340,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68 (0x44) recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 108 (0x6c) recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 108 (0x6c) recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) = 643 (0x283) recvfrom(166,0x800f13303,7387,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' So yes, a lot of recv/send calls as you said before. -Mensagem original- De: Willy Tarreau [mailto:w...@1wt.eu] Enviada em: segunda-feira, 28 de outubro de 2013 03:37 Para: Fred Pedrisa Cc: 'Lukas Tribus'; haproxy@formilux.org Assunto: Re: RES: RES: RES: RES: RES: High CPU Usage (HaProxy) Hi Fred, On Mon, Oct 21, 2013 at 08:41:16PM -0200, Fred Pedrisa wrote: Hello, Ok. This is the top output : 2748 root1 870 30396K 21656K CPU88 28.0H 49.17% haproxy 2726 root1 450 38588K 32128K CPU24 16 21.1H 33.79% haproxy 2718 root1 390 26300K 17464K kqread 28 807:21 29.98% haproxy 2752 root1 380 30396K 21748K kqread 30 859:13 25.39% haproxy 2738 root1 320 22204K 14896K kqread 11 796:36 20.65% haproxy 2740 root1 310 34492K 27404K kqread 10 451:19 18.26% haproxy 2780 root1 310 18108K 9416K kqread 31 568:38 17.77% haproxy 2732 root1 290 34492K 27840K kqread 9 405:50 16.16% haproxy 2730 root1 280 18108K 10868K kqread 15 463:21 15.38% haproxy 2764 root1 290 18108K 10752K CPU15 15 441:34 14.60% haproxy 2760 root1 270 18108K 11620K kqread 30 353:48 12.89% haproxy 2778 root1 260 14012K 8360K kqread 29 407:07 12.16% haproxy 2756 root1 260 34492K 26280K kqread 8 502:13 9.57% haproxy 2746 root1 260 22204K 13036K kqread 29 350:32 9.57% haproxy 47408 root1 250 158M 103M kqread 11 434:37 9.08% haproxy 2734 root1 230 22204K 13704K kqread 15 384:14 6.69% haproxy 2722 root1 230 14012K 5052K kqread 10 203:38 6.30% haproxy 2782 root1 220 14012K 6352K kqread 13 208:07 4.98% haproxy 2744 root1 210 18108K 12496K kqread 28 170:59 3.27% haproxy 2758 root1 210 14012K 8320K kqread 29 71:17 2.69% haproxy 2768 root1 200 14012K 5700K kqread 28 53
Re: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello Fred, On Mon, Oct 28, 2013 at 10:02:15AM -0200, Fred Pedrisa wrote: Hello, Willy. As you said, take a look : getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 926 (0x39e) recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 988 (0x3dc) recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 988 (0x3dc) recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) = 444 (0x1bc) recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 444 (0x1bc) recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 2627 (0xa43) recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 2627 (0xa43) recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1226 (0x4ca) recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 1226 (0x4ca) recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 982 (0x3d6) recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 982 (0x3d6) recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1205 (0x4b5) recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) = 1205 (0x4b5) recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) = 68 (0x44) recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1340,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68 (0x44) recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 108 (0x6c) recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 108 (0x6c) recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) = 643 (0x283) recvfrom(166,0x800f13303,7387,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' So yes, a lot of recv/send calls as you said before. Yes but they're not all that small. The average size looks like .5 or 1kB. That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say 500 bytes per message, this turns into 80k messages per second, which require : - 2 recvfrom() - 1 getsockopt() (we can remove this one, 1.5 doesn't have it) - 1 sendto() So 4 syscalls per message, resulting in 320k syscalls per second. It can start to represent some CPU usage. But there's more. Such small messages are transferred using TCP_NODELAY meaning that a TCP PUSH is set on each outgoing packet and that each of them is immediately ACKed. So you get 80kpps per side in each direction, resulting in 320kpps as well. If you have a firewall running on the system, it might take its share of load as well, which is possibly attributed to the sending process on outgoing messages. That said, even with that in mind, I still consider that the system load is high for the workload. Could you please share the output of vmstat 1 (just take the first 10 lines) ? Also, can you confirm that this is a real machine and that we're not troubleshooting a VM ? It could make sense to try 1.5 (latest snapshot) for maybe the highest loaded process only if that makes the test easier and check if its CPU load drops or not. Best regards, Willy
RES: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello, Willy. Yes, this is a 'real machine', running FreeBSD 9 x64. It is a Xeon E5-2650 Dual (So we have 16 physical cores to use here and 32 threads). We are speaking about 100Kpps (input) and 140Kpps (output) 'approximately'. Here is the vmstat 1 result : procs memory pagedisks faults cpu r b w avmfre flt re pi pofr sr da0 pa0 in sy cs us sy id 7 0 0 4818M35G 643 0 0 0 714 0 0 0 4977 1364 5996 8 25 67 3 0 0 4818M35G 224 0 0 0 174 0 0 0 42698 355001 170303 8 22 71 3 0 0 4818M35G 177 0 0 0 174 0 0 0 28715 383061 138108 7 23 69 4 0 0 4818M35G 173 0 0 0 174 0 0 0 28342 375281 138067 8 24 69 5 0 0 4818M35G 185 0 0 0 174 0 0 0 32900 372294 148576 7 21 71 5 0 0 4818M35G 372 0 0 0 174 0 0 0 29112 364030 138826 7 25 68 4 0 0 4818M35G 159 0 0 0 174 0 0 0 34102 368835 150530 9 22 70 4 0 0 4818M35G 362 0 0 0 174 0 0 0 39928 366139 165853 8 21 71 3 0 0 4818M35G 220 0 0 0 174 0 0 0 39195 371933 163533 8 21 71 6 0 0 4818M35G 262 0 0 0 174 0 0 0 42681 354697 172687 8 21 71 -Mensagem original- De: Willy Tarreau [mailto:w...@1wt.eu] Enviada em: segunda-feira, 28 de outubro de 2013 20:58 Para: Fred Pedrisa Cc: 'Lukas Tribus'; haproxy@formilux.org Assunto: Re: RES: RES: RES: RES: RES: RES: High CPU Usage (HaProxy) Hello Fred, On Mon, Oct 28, 2013 at 10:02:15AM -0200, Fred Pedrisa wrote: Hello, Willy. As you said, take a look : getsockopt(0x12e,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(302,\^D\0\^V0\0\0^z\M-L-\a\0d8\0\0...,926,0x80,NULL,0x0) = 926 (0x39e) recvfrom(682,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 988 (0x3dc) recvfrom(682,0x801f3545c,7042,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a9,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(681,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,988,0x80,NULL,0x0) = 988 (0x3dc) recvfrom(1428,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,8030,0x0,NULL,0x0) = 444 (0x1bc) recvfrom(1428,0x8011b523c,7586,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x593,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1427,\^N\0!\M-0\0\0\M-\\M^_\M-H-\^AoU...,444,0x80,NULL,0x0) = 444 (0x1bc) recvfrom(201,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,8030,0x0,NULL,0x0) = 2627 (0xa43) recvfrom(201,0x800ec5ac3,5403,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0xbf,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(191,\b\0\\0\0\0\M-=\M-]\M-G-\^O\0\0...,2627,0x80,NULL,0x0) = 2627 (0xa43) recvfrom(888,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1226 (0x4ca) recvfrom(888,0x801ee354a,6804,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x377,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(887,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1226,0x80,NULL,0x0) = 1226 (0x4ca) recvfrom(674,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,8030,0x0,NULL,0x0) = 982 (0x3d6) recvfrom(674,0x800f6f456,7048,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x2a1,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(673,\f\0\M-=\M-0\0\0\M^K}\M-#-d\r\0...,982,0x80,NULL,0x0) = 982 (0x3d6) recvfrom(1032,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,8030,0x0,NULL,0x0) = 1205 (0x4b5) recvfrom(1032,0x801ddb535,6825,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x407,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1031,\^S\0W0\0\0\M-,\^?\M-L-\^P\0\^E@...,1205,0x80,NULL,0x0) = 1205 (0x4b5) recvfrom(1339,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,8030,0x0,NULL,0x0) = 68 (0x44) recvfrom(1339,0x8011790c4,7962,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x53c,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(1340,\v\0tpDa\^A\^DV \0\0\^A\M^R\M^K...,68,0x80,NULL,0x0) = 68 (0x44) recvfrom(913,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,8030,0x0,NULL,0x0) = 108 (0x6c) recvfrom(913,0x8019090ec,7922,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' getsockopt(0x392,0x,0x1007,0x7fffdb94,0x7fffdb90,0x0) = 0 (0x0) sendto(914,\v\0tpj\M-h\^A\^D\M-Q\^]\0\0\^A...,108,0x80,NULL,0x0) = 108 (0x6c) recvfrom(166,\^D\0\^V0\0\0\M-$\M^@\M-L-\^T\0p...,8030,0x0,NULL,0x0) = 643 (0x283) recvfrom(166,0x800f13303,7387,0x0,0x0,0x0) ERR#35 'Resource temporarily unavailable' So yes, a lot of recv/send calls as you said before. Yes but they're not all that small. The average size looks like .5 or 1kB. That said, assuming you're dealing with 300 Mbps (about 40 MB/s) and say 500 bytes per message, this turns into 80k messages per
RES: High CPU Usage (HaProxy)
Hello, Are there any parameters that could be touched to potentially help with the cpu usage ? And the -sf parameter, will kill the old one while keeping the new one, that's it ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 14:54 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) Hi Fred, I imagine that your high load is due to running many instances of HAProxy, but hard to be 100% without all the information. Load indicates that there are processes waiting to execute, so by reducing the number of HAProxy processes you should see a reduced load (but still high cpu utilization). You can accomplish the same effect as reloading settings by running another HAProxy process with the -sf parameter (this would potentially lead to more load, but could be mitigated by combining your many instances). From the manpage: -sf pidlist Send FINISH signal to the pids in pidlist after startup. The processes which receive this signal will wait for all sessions to finish before exiting. This option must be specified last, followed by any number of PIDs. Technically speaking, SIGTTOU and SIGUSR1 are sent. Cheers, Jeff On Mon, Oct 21, 2013 at 12:46 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, I am using many haproxy instances, for separated projects. This is causing a high cpu usage, and a high load in the OS up to 12.00 and so on. The question is, using just one instance, would reduce the CPU load, or it would make no difference at all ? Also, is there a way to just reload the settings of an already running instance ? Thanks !
RES: High CPU Usage (HaProxy)
Hello, So, I can run all my instances in just one process and work with it this way, by using -sf right ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 14:59 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) -sf will not actually kill the old process(es) immediately. They'll end when they are no longer handling the currently connected clients they have, but the new process will handle all new connections. On Mon, Oct 21, 2013 at 12:57 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, Are there any parameters that could be touched to potentially help with the cpu usage ? And the -sf parameter, will kill the old one while keeping the new one, that's it ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 14:54 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) Hi Fred, I imagine that your high load is due to running many instances of HAProxy, but hard to be 100% without all the information. Load indicates that there are processes waiting to execute, so by reducing the number of HAProxy processes you should see a reduced load (but still high cpu utilization). You can accomplish the same effect as reloading settings by running another HAProxy process with the -sf parameter (this would potentially lead to more load, but could be mitigated by combining your many instances). From the manpage: -sf pidlist Send FINISH signal to the pids in pidlist after startup. The processes which receive this signal will wait for all sessions to finish before exiting. This option must be specified last, followed by any number of PIDs. Technically speaking, SIGTTOU and SIGUSR1 are sent. Cheers, Jeff On Mon, Oct 21, 2013 at 12:46 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, I am using many haproxy instances, for separated projects. This is causing a high cpu usage, and a high load in the OS up to 12.00 and so on. The question is, using just one instance, would reduce the CPU load, or it would make no difference at all ? Also, is there a way to just reload the settings of an already running instance ? Thanks !
RES: High CPU Usage (HaProxy)
Hello, Would this cause a port conflict or anything like this ? Or when you use -sf, it automatically 'unbind' the port on the old process, allowing it only for the new one ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 15:07 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) Yes you could have all different applications in one process/configuration. When you 'reload' using -sf you'd have 2 processes for a while (depends on if you have long-lived connections or not). The old processes scheduled to die will typically take a lot less CPU, because they are handling only a couple connections and not any new ones. On Mon, Oct 21, 2013 at 1:04 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, So, I can run all my instances in just one process and work with it this way, by using -sf right ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 14:59 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) -sf will not actually kill the old process(es) immediately. They'll end when they are no longer handling the currently connected clients they have, but the new process will handle all new connections. On Mon, Oct 21, 2013 at 12:57 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, Are there any parameters that could be touched to potentially help with the cpu usage ? And the -sf parameter, will kill the old one while keeping the new one, that's it ? -Mensagem original- De: Jeff Zellner [mailto:j...@olark.com] Enviada em: segunda-feira, 21 de outubro de 2013 14:54 Para: Fred Pedrisa Cc: haproxy@formilux.org Assunto: Re: High CPU Usage (HaProxy) Hi Fred, I imagine that your high load is due to running many instances of HAProxy, but hard to be 100% without all the information. Load indicates that there are processes waiting to execute, so by reducing the number of HAProxy processes you should see a reduced load (but still high cpu utilization). You can accomplish the same effect as reloading settings by running another HAProxy process with the -sf parameter (this would potentially lead to more load, but could be mitigated by combining your many instances). From the manpage: -sf pidlist Send FINISH signal to the pids in pidlist after startup. The processes which receive this signal will wait for all sessions to finish before exiting. This option must be specified last, followed by any number of PIDs. Technically speaking, SIGTTOU and SIGUSR1 are sent. Cheers, Jeff On Mon, Oct 21, 2013 at 12:46 PM, Fred Pedrisa fredhp...@hotmail.com wrote: Hello, I am using many haproxy instances, for separated projects. This is causing a high cpu usage, and a high load in the OS up to 12.00 and so on. The question is, using just one instance, would reduce the CPU load, or it would make no difference at all ? Also, is there a way to just reload the settings of an already running instance ? Thanks !
RES: High CPU Usage (HaProxy)
Hi, Lukas. We are speaking about : FreeBSD 9.2 - Dual Xeon E5-2650 - 32 GB RAM. Haproxy 1.4 (Latest) 30.000~35.000 concurrent connections. About 200~300 Megabit/s. In totality. Sincerely, Fred -Mensagem original- De: Lukas Tribus [mailto:luky...@hotmail.com] Enviada em: segunda-feira, 21 de outubro de 2013 15:23 Para: Fred Pedrisa; haproxy@formilux.org Assunto: RE: High CPU Usage (HaProxy) Hi Fred, I am using many haproxy instances, for separated projects. This is causing a high cpu usage, and a high load in the OS up to 12.00 and so on. The question is, using just one instance, would reduce the CPU load, or it would make no difference at all ? There is no way we can tell with the information you gave us. Please describe how exactly your architecture looks like, what hardware and software you have (OS, kernel and haproxy release), how you configured haproxy, how many concurrent connections you have and a much traffic you are forwarding. Also, I suggest you show us some top or ps outputs, as the load can easily come from the system as well. Regards, Lukas
RE: RES: High CPU Usage (HaProxy)
Hi Fred, FreeBSD 9.2 - Dual Xeon E5-2650 - 32 GB RAM. Haproxy 1.4 (Latest) 30.000~35.000 concurrent connections. About 200~300 Megabit/s. In totality. Alright, but we still need to know what haproxy does in this box. Can you post your configuration and explain what it does? Also we need those ps or top outputs. Also please tell us what NIC you are using. Regards, Lukas
RES: RES: High CPU Usage (HaProxy)
Hello, I am using a 10 Gbps Intel 520-DA2 NIC. The cpu usage in top vary per process we have something like : Haproxy - 93% Haproxy - 85% Haproxy - 50% Haproxy - 43% Haproxy - 32% Haproxy - 20% Haproxy - 15% Haproxy - 5% Haproxy - 1% About 30-40 Processes. I am just using it as a tcp proxy, basic functionality, no load balancing, no status checking or http mode at all. Just a simple backend : User - Haproxy - Destination. -Mensagem original- De: Lukas Tribus [mailto:luky...@hotmail.com] Enviada em: segunda-feira, 21 de outubro de 2013 19:13 Para: Fred Pedrisa; haproxy@formilux.org Assunto: RE: RES: High CPU Usage (HaProxy) Hi Fred, FreeBSD 9.2 - Dual Xeon E5-2650 - 32 GB RAM. Haproxy 1.4 (Latest) 30.000~35.000 concurrent connections. About 200~300 Megabit/s. In totality. Alright, but we still need to know what haproxy does in this box. Can you post your configuration and explain what it does? Also we need those ps or top outputs. Also please tell us what NIC you are using. Regards, Lukas
RE: RES: RES: High CPU Usage (HaProxy)
Hi Fred, I am using a 10 Gbps Intel 520-DA2 NIC. The cpu usage in top vary per process we have something like : Haproxy - 93% Haproxy - 85% Haproxy - 50% Haproxy - 43% Haproxy - 32% Haproxy - 20% Haproxy - 15% Haproxy - 5% Haproxy - 1% About 30-40 Processes. I am just using it as a tcp proxy, basic functionality, no load balancing, no status checking or http mode at all. Just a simple backend : User - Haproxy - Destination. Thats definitely not normal with that kind of traffic. Please post the output of haproxy -vv and please do shows us one of those haproxy configurations (with high cpu), even if they are seemingly simple. I would suggest to use a single haproxy instance (just one process). Regards, Lukas
RES: RES: RES: High CPU Usage (HaProxy)
Hello, Yes, this is why I was speaking with Jeff about this. Because I suppose that these processes have a default loop, that uses a certain amount of CPU (kQueue implementation) Example config : global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 16384 daemon defaults log global modehttp option dontlognull option redispatch retries 3 maxconn 16384 contimeout 5000 clitimeout 5 srvtimeout 5 listen stats 185.30.164.40:1 balance mode http stats enable listen port_link_1 mode tcp option tcplog option nolinger bind X.X.X.X:1433 bind X.X.X.X:3500 bind X.X.X.X:3800 bind X.X.X.X: server link Y.Y.Y.Y source X.X.X.X Output of -vv : HA-Proxy version 1.4.23 2013/04/03 Copyright 2000-2013 Willy Tarreau w...@1wt.eu Build options : TARGET = openbsd CPU = generic CC = gcc Default settings : maxconn = 1024, bufsize = 8030, maxrewrite = 1030, maxpollevents = 200 Encrypted password support via crypt(3): no Available polling systems : kqueue : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use kqueue. -Mensagem original- De: Lukas Tribus [mailto:luky...@hotmail.com] Enviada em: segunda-feira, 21 de outubro de 2013 19:40 Para: Fred Pedrisa; haproxy@formilux.org Assunto: RE: RES: RES: High CPU Usage (HaProxy) Hi Fred, I am using a 10 Gbps Intel 520-DA2 NIC. The cpu usage in top vary per process we have something like : Haproxy - 93% Haproxy - 85% Haproxy - 50% Haproxy - 43% Haproxy - 32% Haproxy - 20% Haproxy - 15% Haproxy - 5% Haproxy - 1% About 30-40 Processes. I am just using it as a tcp proxy, basic functionality, no load balancing, no status checking or http mode at all. Just a simple backend : User - Haproxy - Destination. Thats definitely not normal with that kind of traffic. Please post the output of haproxy -vv and please do shows us one of those haproxy configurations (with high cpu), even if they are seemingly simple. I would suggest to use a single haproxy instance (just one process). Regards, Lukas
RE: RES: RES: RES: High CPU Usage (HaProxy)
Hi, Yes, this is why I was speaking with Jeff about this. Because I suppose that these processes have a default loop, that uses a certain amount of CPU (kQueue implementation) Its not busy polling, if thats what you are referring to. CPU usage should be low with kqueue (because its fully event based). I think that you may face some scheduling issues (like context switching), because of the amount of haproxy instances you are running. I would really suggest to run haproxy in a single instance and process. Would this cause a port conflict or anything like this ? Or when you use -sf, it automatically 'unbind' the port on the old process, allowing it only for the new one ? HAproxy will take care of this, no conflicts are expected. HA-Proxy version 1.4.23 2013/04/03 1.4.23 is not the latest, 1.4.24 is. Important bugfixes are in 1.4.24, though none matches your symptoms. Regards, Lukas
RES: RES: RES: RES: High CPU Usage (HaProxy)
Hi, Yes, the current version (for my usage) is really stable. However, you are right, because too many processes, will create too many threads, assuming I have just 16 Physical Cores... Do you believe on a good CPU usage decrease, by switching to one process only ? -Mensagem original- De: Lukas Tribus [mailto:luky...@hotmail.com] Enviada em: segunda-feira, 21 de outubro de 2013 20:08 Para: Fred Pedrisa; haproxy@formilux.org Assunto: RE: RES: RES: RES: High CPU Usage (HaProxy) Hi, Yes, this is why I was speaking with Jeff about this. Because I suppose that these processes have a default loop, that uses a certain amount of CPU (kQueue implementation) Its not busy polling, if thats what you are referring to. CPU usage should be low with kqueue (because its fully event based). I think that you may face some scheduling issues (like context switching), because of the amount of haproxy instances you are running. I would really suggest to run haproxy in a single instance and process. Would this cause a port conflict or anything like this ? Or when you use -sf, it automatically 'unbind' the port on the old process, allowing it only for the new one ? HAproxy will take care of this, no conflicts are expected. HA-Proxy version 1.4.23 2013/04/03 1.4.23 is not the latest, 1.4.24 is. Important bugfixes are in 1.4.24, though none matches your symptoms. Regards, Lukas
RE: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hi, Yes, the current version (for my usage) is really stable. However, you are right, because too many processes, will create too many threads, assuming I have just 16 Physical Cores... Do you believe on a good CPU usage decrease, by switching to one process only ? I can't guarantee it, but its definitely a step in the right direction. The highest performance with the lowest load can be achieved when: - using a single instance/process - using kqueue (bsd) or epoll (linux) - pinning the process to a core - pinning the system/nic interrupts to another core on the same physical processor (so they can share the layer2 cache) But the load you are seeing is not just suboptimal, its abnormal imo, so you are not looking for a little performance tweaking here and there, but to fix what is a major performance/load problem. I believe there is a good chance that the performance is that bad because of scheduling/context switching problems, caused by 1 haproxy process. Regards, Lukas
RES: RES: RES: RES: RES: High CPU Usage (HaProxy)
Hello, Ok. This is the top output : 2748 root1 870 30396K 21656K CPU88 28.0H 49.17% haproxy 2726 root1 450 38588K 32128K CPU24 16 21.1H 33.79% haproxy 2718 root1 390 26300K 17464K kqread 28 807:21 29.98% haproxy 2752 root1 380 30396K 21748K kqread 30 859:13 25.39% haproxy 2738 root1 320 22204K 14896K kqread 11 796:36 20.65% haproxy 2740 root1 310 34492K 27404K kqread 10 451:19 18.26% haproxy 2780 root1 310 18108K 9416K kqread 31 568:38 17.77% haproxy 2732 root1 290 34492K 27840K kqread 9 405:50 16.16% haproxy 2730 root1 280 18108K 10868K kqread 15 463:21 15.38% haproxy 2764 root1 290 18108K 10752K CPU15 15 441:34 14.60% haproxy 2760 root1 270 18108K 11620K kqread 30 353:48 12.89% haproxy 2778 root1 260 14012K 8360K kqread 29 407:07 12.16% haproxy 2756 root1 260 34492K 26280K kqread 8 502:13 9.57% haproxy 2746 root1 260 22204K 13036K kqread 29 350:32 9.57% haproxy 47408 root1 250 158M 103M kqread 11 434:37 9.08% haproxy 2734 root1 230 22204K 13704K kqread 15 384:14 6.69% haproxy 2722 root1 230 14012K 5052K kqread 10 203:38 6.30% haproxy 2782 root1 220 14012K 6352K kqread 13 208:07 4.98% haproxy 2744 root1 210 18108K 12496K kqread 28 170:59 3.27% haproxy 2758 root1 210 14012K 8320K kqread 29 71:17 2.69% haproxy 2768 root1 200 14012K 5700K kqread 28 53:16 1.46% haproxy 2766 root1 210 14012K 4868K kqread 21 88:39 1.27% haproxy 2724 root1 200 14012K 7136K kqread 14 89:32 1.17% haproxy 2728 root1 200 14012K 6520K kqread 30 65:21 1.17% haproxy 2716 root1 200 14012K 5216K kqread 28 67:38 0.98% haproxy 2762 root1 200 9916K 3936K kqread 30 39:16 0.68% haproxy 2720 root1 200 14012K 8564K kqread 23 104:37 0.39% haproxy 2754 root1 200 14012K 6312K kqread 22 80:37 0.39% haproxy 2736 root1 200 14012K 5884K kqread 25 59:06 0.20% haproxy 2772 root1 200 14012K 6984K kqread 10 73:54 0.10% haproxy 2770 root1 200 34492K 25516K kqread 31 111:38 0.00% haproxy Right now, the load is around 12.45, sometimes going up to 16.00 +/- -Mensagem original- De: Lukas Tribus [mailto:luky...@hotmail.com] Enviada em: segunda-feira, 21 de outubro de 2013 20:39 Para: Fred Pedrisa; haproxy@formilux.org Assunto: RE: RES: RES: RES: RES: High CPU Usage (HaProxy) Hi, Yes, the current version (for my usage) is really stable. However, you are right, because too many processes, will create too many threads, assuming I have just 16 Physical Cores... Do you believe on a good CPU usage decrease, by switching to one process only ? I can't guarantee it, but its definitely a step in the right direction. The highest performance with the lowest load can be achieved when: - using a single instance/process - using kqueue (bsd) or epoll (linux) - pinning the process to a core - pinning the system/nic interrupts to another core on the same physical processor (so they can share the layer2 cache) But the load you are seeing is not just suboptimal, its abnormal imo, so you are not looking for a little performance tweaking here and there, but to fix what is a major performance/load problem. I believe there is a good chance that the performance is that bad because of scheduling/context switching problems, caused by 1 haproxy process. Regards, Lukas