On Sun, Nov 04, 2007 at 12:15:56PM -0800, Steven Grimm wrote:
> On Nov 4, 2007, at 8:13 AM, Marc Lehmann wrote:
> >This would create additional loops (event_bases). The difference is  
> >that
> >these cannot handle signals (or child watchers) at all, with the  
> >default loop
> >being the only one to do signal handling.
> 
> This seems like a totally sane approach to me. Having multiple loops  
> is a big performance win for some applications (e.g., memcached in  
> multithreaded mode), so making the behavior a bit more consistent is a  
> good thing.

It's only a performance win when the number of context switches and
cache stomping, as a result of multiple threads cycling within their own
context does not outweigh the "latency" of a model using less or even
1 thread.

Consider a room with 20 people in it and a single door. The goal is to
hand them a football as a new football is dropped off the assembly
line and have them exit the door. You could throw them all a new football
right as it comes off the line and have them immediately rush for the door -
resulting in a log jam that one has to stop tending the assembly line to
handle. You then head back to the line and begin the patterened task of
throwing footballs to workers as fast as you can - only to have the log jam
repeat itself.

The only way to solve this efficiently is to have less people try and exit
the door at once, or add more doors (CPUs).

> Now if only there were a way to wake just one thread up when input  
> arrives on a descriptor being monitored by multiple threads... But  
> that isn't supported by any of the underlying poll mechanisms as far  
> as I can tell.
> 
> -Steve

It isn't typically supported because it's not a particularly useful or
efficient path to head down in the first place.

Thread pools being what they are, incredibly useful and pretty much the de
facto in threaded code, do have their own abstraction limits as well.

Setting up a thread pool, an inherently asynchronous and unordered collection
of contexts, to asynchronously process an ordered stream of data (unless
your protocol has no "sequence", which I doubt), which I presume to somehow
be in the name of performance, is way more complex and troublesome design
than it needs to be. It's anchored somewhat to the "every thread can do
anything" school of thought which has many hidden costs.

The issue in itself is having multiple threads monitor the *same* fd via any
kind of wait mechanism. It's short circuiting application layers, so that a
thread (*any* thread in that pool) can immediately process new data. I think
it would be much more structured, less complex (i.e. better performance in
the long run anyways), and a cleaner design to have a set number (or even
1) thread handle the "controller" task of tending to new network events,
push them onto a per-connection PDU queue, or pre-process in some form or
fashion, condsig, and let previously mentioned thread pool handle it in an
ordered fashion. Having a group of threads listening to the same fd has now
just thrown our football manager out entirely and become a smash-and-grab
for new footballs. There's still the door to get through.

-cl
_______________________________________________
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users

Reply via email to