On Mon, Jun 4, 2012 at 8:40 AM, Marc Lehmann <schm...@schmorp.de> wrote:
> On Mon, Jun 04, 2012 at 02:44:19PM +0200, Joachim Nilsson 
> <troglo...@gmail.com> wrote:
>> On Mon, Jun 4, 2012 at 2:29 PM, Marc Lehmann <schm...@schmorp.de> wrote:
>> > On Mon, Jun 04, 2012 at 08:41:38AM +0800, 钱晓明 <mailtoanta...@163.com> 
>> > wrote:
>> >> If there are more than one event loop on same udp socket, and each of
>> >> them in different thread, datagrams will be processed by all threads
>> >> concurrently? For example, one thread is processing udp message while
> If the processing of the udp packet takes comparatively long, then you will
> have few threads selecting on the fd (in the best case only one on average),
> because the others are busy processing something else.

In the specific case of connectionless UDP packets, if the processing
overhead to handle each incoming packet is very optimized and fast, on
Linux hosts, I've found that the Linux socket code itself becomes the
scaling limit.  One of my projects is an open source authoritative DNS
server where I did a lot of perf testing on this scenario.  Spawning
multiple threads looping on a single socket doesn't help, as they're
all waiting on some lower-level socket serialization in the kernel, so
you get basically the same throughput with 8 threads as you do with 1,
even if you have a bunch of CPU cores to use.

But you can scale up to more packets/sec handled on a given machine by
doing one thread-per-socket (-per-core) and distributing the packets
to these multiple sockets by some other mechanism.  e.g. in the DNS
case with an 8-core server, you might use some front-end hardware
loadbalancer (or some linux iptables/ipvsadm type hacks) to
re-distribute the incoming UDP packets on port 53 to local ports
1060-1067, run a thread+loop per socket on each of your 8 cores
listening on those ports, and of course have the LB stuff remap the
responses to source port 53 afterwards as well.  Then you can see real
scaling over anything you might try to do against a single UDP socket.

This all depends, as Marc was saying, on how much per-packet
processing overhead you have in your daemon.  If the overhead was
higher then yes you might be better off with several threads/loops on
a single socket.  I'd say as a very very rough rule of thumb: if your
processing code is light and optimal enough that it could handle
somewhere in the ballpark of say 20-50K (or higher) pps on a single
CPU core on the target machine, you're probably in the territory where
the socket is the limit and it's time to do thread-per-socket as
described above.

-- Brandon

_______________________________________________
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev

Reply via email to