Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Scott Lamb
Christopher Layne wrote:
> On Sun, Nov 04, 2007 at 04:23:01PM -0800, Scott Lamb wrote:
>>> It wasn't what I expected; I was fully confident at first that the
>>> thread-pool, work-queue model would be the way to go, since it's one
>>> I've implemented in many applications in the past. But the numbers said
>>> otherwise.
>> Thanks for the case study. To rephrase (hopefully correctly), you tried
>> these two models:
>>
>> 1) one thread polls and puts events on a queue; a bunch of other threads
>> pull from the queue. (resulted in high latency, and I'm not too
>> surprised...an extra context switch before handling any events.)
> 
> So back to this..
> 
>> 2) a bunch of threads read and handle events independently. (your
>> current model.)
> 
> BTW: How does this model somehow exempt itself from said context switching
> issue of the former?

Hmm, William Ahern says that at least on Linux, they only wake one
thread per event. That would explain it.

>> Did you also tried the so-called "leader/follower" model, in which the
>> thread which does the polling handles the first event and puts the rest
>> on a queue; another thread takes over polling if otherwise idle while
>> the first thread is still working. My impression this was a widely
>> favored model, though I don't know the details of where each performs best.
> 
> Something about this just seems like smoke and mirrors to me. At the end of
> the day we still only have a finite amount of CPU cores available to us and
> any amount of playing with the order of things is not going to extract any
> magical *more* throughput out of a given box. Yes, some of these methods
> influence recv/send buffers and have a cascading effect on overall throughput,
> but efficient code and algorithms are going to make the real difference - not
> goofy thread games.
> 
> (and this is coming from someone who *likes* comp.programming.threads)

Oh, I don't know, there is something to be said for not making a handoff
between threads if you can avoid it. You're not going to get more
throughput than n_cores times what you got with one processor, but I'd
expect avoiding context switches and cache bouncing to help you get
closer to that.
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Christopher Layne
On Sun, Nov 04, 2007 at 04:23:01PM -0800, Scott Lamb wrote:
> > It wasn't what I expected; I was fully confident at first that the
> > thread-pool, work-queue model would be the way to go, since it's one
> > I've implemented in many applications in the past. But the numbers said
> > otherwise.
> 
> Thanks for the case study. To rephrase (hopefully correctly), you tried
> these two models:
> 
> 1) one thread polls and puts events on a queue; a bunch of other threads
> pull from the queue. (resulted in high latency, and I'm not too
> surprised...an extra context switch before handling any events.)

So back to this..

> 2) a bunch of threads read and handle events independently. (your
> current model.)

BTW: How does this model somehow exempt itself from said context switching
issue of the former?

> Did you also tried the so-called "leader/follower" model, in which the
> thread which does the polling handles the first event and puts the rest
> on a queue; another thread takes over polling if otherwise idle while
> the first thread is still working. My impression this was a widely
> favored model, though I don't know the details of where each performs best.

Something about this just seems like smoke and mirrors to me. At the end of
the day we still only have a finite amount of CPU cores available to us and
any amount of playing with the order of things is not going to extract any
magical *more* throughput out of a given box. Yes, some of these methods
influence recv/send buffers and have a cascading effect on overall throughput,
but efficient code and algorithms are going to make the real difference - not
goofy thread games.

(and this is coming from someone who *likes* comp.programming.threads)

-cl
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread William Ahern
On Sun, Nov 04, 2007 at 03:18:42PM -0800, Steven Grimm wrote:
> You've just pretty accurately described my initial implementation of  
> thread support in memcached. It worked, but it was both more CPU- 
> intensive and had higher response latency (yes, I actually measured  
> it) than the model I'm using now. The only practical downside of my  
> current implementation is that when there is only one UDP packet  
> waiting to be processed, some CPU time is wasted on the threads that  
> don't end up winning the race to read it. But those threads were idle  
> at that instant anyway (or they wouldn't have been in a position to  
> wake up) so, according to my benchmarking, there doesn't turn out to  
> be an impact on latency. And though I am wasting CPU cycles, my total  
> CPU consumption still ends up being lower than passing messages around  
> between threads.
> 

Is this on Linux? They addressed the stampeding herd problem years ago. If
you dig deep down in the kernel you'll see their waitq implemention for
non-blocking socket work (and lots of other stuff). Only one thread is ever
woken per event.
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Scott Lamb
Steven Grimm wrote:
> On Nov 4, 2007, at 3:07 PM, Christopher Layne wrote:
>> The issue in itself is having multiple threads monitor the *same* fd
>> via any
>> kind of wait mechanism. It's short circuiting application layers, so
>> that a
>> thread (*any* thread in that pool) can immediately process new data. I
>> think
>> it would be much more structured, less complex (i.e. better
>> performance in
>> the long run anyways), and a cleaner design to have a set number (or even
>> 1) thread handle the "controller" task of tending to new network events,
>> push them onto a per-connection PDU queue, or pre-process in some form or
>> fashion, condsig, and let previously mentioned thread pool handle it
>> in an
>> ordered fashion.
> 
> You've just pretty accurately described my initial implementation of
> thread support in memcached. It worked, but it was both more
> CPU-intensive and had higher response latency (yes, I actually measured
> it) than the model I'm using now. The only practical downside of my
> current implementation is that when there is only one UDP packet waiting
> to be processed, some CPU time is wasted on the threads that don't end
> up winning the race to read it. But those threads were idle at that
> instant anyway (or they wouldn't have been in a position to wake up) so,
> according to my benchmarking, there doesn't turn out to be an impact on
> latency. And though I am wasting CPU cycles, my total CPU consumption
> still ends up being lower than passing messages around between threads.
> 
> It wasn't what I expected; I was fully confident at first that the
> thread-pool, work-queue model would be the way to go, since it's one
> I've implemented in many applications in the past. But the numbers said
> otherwise.

Thanks for the case study. To rephrase (hopefully correctly), you tried
these two models:

1) one thread polls and puts events on a queue; a bunch of other threads
pull from the queue. (resulted in high latency, and I'm not too
surprised...an extra context switch before handling any events.)

2) a bunch of threads read and handle events independently. (your
current model.)

Did you also tried the so-called "leader/follower" model, in which the
thread which does the polling handles the first event and puts the rest
on a queue; another thread takes over polling if otherwise idle while
the first thread is still working. My impression this was a widely
favored model, though I don't know the details of where each performs best.
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Steven Grimm

On Nov 4, 2007, at 3:07 PM, Christopher Layne wrote:
The issue in itself is having multiple threads monitor the *same* fd  
via any
kind of wait mechanism. It's short circuiting application layers, so  
that a
thread (*any* thread in that pool) can immediately process new data.  
I think
it would be much more structured, less complex (i.e. better  
performance in
the long run anyways), and a cleaner design to have a set number (or  
even
1) thread handle the "controller" task of tending to new network  
events,
push them onto a per-connection PDU queue, or pre-process in some  
form or
fashion, condsig, and let previously mentioned thread pool handle it  
in an

ordered fashion.


You've just pretty accurately described my initial implementation of  
thread support in memcached. It worked, but it was both more CPU- 
intensive and had higher response latency (yes, I actually measured  
it) than the model I'm using now. The only practical downside of my  
current implementation is that when there is only one UDP packet  
waiting to be processed, some CPU time is wasted on the threads that  
don't end up winning the race to read it. But those threads were idle  
at that instant anyway (or they wouldn't have been in a position to  
wake up) so, according to my benchmarking, there doesn't turn out to  
be an impact on latency. And though I am wasting CPU cycles, my total  
CPU consumption still ends up being lower than passing messages around  
between threads.


It wasn't what I expected; I was fully confident at first that the  
thread-pool, work-queue model would be the way to go, since it's one  
I've implemented in many applications in the past. But the numbers  
said otherwise.


-Steve
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Adrian Chadd
On Sun, Nov 04, 2007, Steven Grimm wrote:

> >Would this be for listen sockets, or for general read/write IO on an  
> >FD?
> 
> Specifically for a mixed TCP- and UDP-based protocol where any thread  
> is equally able to handle an incoming request on the UDP socket, but  
> TCP sockets are bound to particular threads.

Makes sense. Doesn't solaris event ports system handle this? I haven't
checked in depth.

It sounds like something that kqueue could be extended to do relatively
easily.

What about multiple threads blocking on the same UDP socket? Do multiple
threads wake up when IO arrives? Or just one?




Adrian

___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Steven Grimm


On Nov 4, 2007, at 3:03 PM, Adrian Chadd wrote:


On Sun, Nov 04, 2007, Steven Grimm wrote:


Now if only there were a way to wake just one thread up when input
arrives on a descriptor being monitored by multiple threads... But
that isn't supported by any of the underlying poll mechanisms as far
as I can tell.


Would this be for listen sockets, or for general read/write IO on an  
FD?


Specifically for a mixed TCP- and UDP-based protocol where any thread  
is equally able to handle an incoming request on the UDP socket, but  
TCP sockets are bound to particular threads.


Unfortunately the vast majority of incoming requests are on the UDP  
socket, too many to handle on one thread.


Before anyone suggests it: a message-passing architecture (one thread  
reads the UDP socket and queues up work for other threads) gave me  
measurably higher request-handling latency than the current setup,  
which works but eats lots of system CPU time because all the threads  
wake up on each UDP packet. It makes sense: the current scheme  
involves fewer context switches for a given request (at least, on the  
thread that ends up handling it), and context switches aren't free.


Ideally I'd love a mode where I could say, "Only trigger one of the  
waiting epoll instances when this descriptor has input available."  
Sort of pthread_cond_signal() semantics, as opposed to the current  
pthread_cond_broadcast() semantics. (Yes, I'm aware that  
pthread_cond_signal() is not *guaranteed* to wake up only one waiting  
thread -- but in practice that's what it does.)


-Steve
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Christopher Layne
On Sun, Nov 04, 2007 at 12:15:56PM -0800, Steven Grimm wrote:
> On Nov 4, 2007, at 8:13 AM, Marc Lehmann wrote:
> >This would create additional loops (event_bases). The difference is  
> >that
> >these cannot handle signals (or child watchers) at all, with the  
> >default loop
> >being the only one to do signal handling.
> 
> This seems like a totally sane approach to me. Having multiple loops  
> is a big performance win for some applications (e.g., memcached in  
> multithreaded mode), so making the behavior a bit more consistent is a  
> good thing.

It's only a performance win when the number of context switches and
cache stomping, as a result of multiple threads cycling within their own
context does not outweigh the "latency" of a model using less or even
1 thread.

Consider a room with 20 people in it and a single door. The goal is to
hand them a football as a new football is dropped off the assembly
line and have them exit the door. You could throw them all a new football
right as it comes off the line and have them immediately rush for the door -
resulting in a log jam that one has to stop tending the assembly line to
handle. You then head back to the line and begin the patterened task of
throwing footballs to workers as fast as you can - only to have the log jam
repeat itself.

The only way to solve this efficiently is to have less people try and exit
the door at once, or add more doors (CPUs).

> Now if only there were a way to wake just one thread up when input  
> arrives on a descriptor being monitored by multiple threads... But  
> that isn't supported by any of the underlying poll mechanisms as far  
> as I can tell.
> 
> -Steve

It isn't typically supported because it's not a particularly useful or
efficient path to head down in the first place.

Thread pools being what they are, incredibly useful and pretty much the de
facto in threaded code, do have their own abstraction limits as well.

Setting up a thread pool, an inherently asynchronous and unordered collection
of contexts, to asynchronously process an ordered stream of data (unless
your protocol has no "sequence", which I doubt), which I presume to somehow
be in the name of performance, is way more complex and troublesome design
than it needs to be. It's anchored somewhat to the "every thread can do
anything" school of thought which has many hidden costs.

The issue in itself is having multiple threads monitor the *same* fd via any
kind of wait mechanism. It's short circuiting application layers, so that a
thread (*any* thread in that pool) can immediately process new data. I think
it would be much more structured, less complex (i.e. better performance in
the long run anyways), and a cleaner design to have a set number (or even
1) thread handle the "controller" task of tending to new network events,
push them onto a per-connection PDU queue, or pre-process in some form or
fashion, condsig, and let previously mentioned thread pool handle it in an
ordered fashion. Having a group of threads listening to the same fd has now
just thrown our football manager out entirely and become a smash-and-grab
for new footballs. There's still the door to get through.

-cl
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Adrian Chadd
On Sun, Nov 04, 2007, Steven Grimm wrote:

> Now if only there were a way to wake just one thread up when input  
> arrives on a descriptor being monitored by multiple threads... But  
> that isn't supported by any of the underlying poll mechanisms as far  
> as I can tell.

Would this be for listen sockets, or for general read/write IO on an FD?




Adrian

___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Steven Grimm

On Nov 4, 2007, at 8:13 AM, Marc Lehmann wrote:
This would create additional loops (event_bases). The difference is  
that
these cannot handle signals (or child watchers) at all, with the  
default loop

being the only one to do signal handling.


This seems like a totally sane approach to me. Having multiple loops  
is a big performance win for some applications (e.g., memcached in  
multithreaded mode), so making the behavior a bit more consistent is a  
good thing.


Now if only there were a way to wake just one thread up when input  
arrives on a descriptor being monitored by multiple threads... But  
that isn't supported by any of the underlying poll mechanisms as far  
as I can tell.


-Steve

___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


[Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Marc Lehmann
> On Sat, Nov 03, 2007 at 03:45:39PM -0700, William Ahern <[EMAIL PROTECTED]> 
> wrote:
> > Curious how you managed to do this. Are you checking the process PID on each
> > loop?
> 
> I considered that, but I think its too slow (one also needs to be careful
> that watchers don't change e.g. epoll state until the getpid check is
> done), or at leats I think I don't want that speed hit, no matter what.

After giving signal handling and threads a lot of thought, I came to these
conclusions:

- requiring pthreads or windows mutexes by default is not acceptable,
  but thats the only way to distribute signal events among event loops
  properly, or globally among many threads if signal handling were global.
- the only way to do it without locking is to only allow a single
  loop to handle events.

This is the interface I came up with to manage multiple loops (which I
think makes more sense than the interface currently in libevent):

   struct ev_loop *ev_default_loop (int methods);
   void ev_default_destroy (void);
   void ev_default_fork (void);

this would create "the default" loop (event_base). ev_default_loop
would always create the same loop, and it would be the one to use for
third-party libraries in general, too. The fork method can be called in
the parent or child (or even in both, or without forking), and it would
destroy and recreate the kernel state but keep all the watchers for the
default loop.

   struct ev_loop *ev_loop_new (int methods);
   void ev_loop_destroy (EV_P);
   void ev_loop_fork (EV_P);

This would create additional loops (event_bases). The difference is that
these cannot handle signals (or child watchers) at all, with the default loop
being the only one to do signal handling.

This would be consistent with how signals are usually handled in a pthreads
environment: block signals in all threads and in one thread handle them all
(sigwait, or using the default mainloop).

No locking inside libevent would be required this way.

I'll implement this in my libev replacement code, unless somebody else comes
up with a better idea.

One such idea that isn't better, but different, would be to require the
user to provide mutex support, such as in ev_init_locking (size, init_cb,
lock_cb, unlock_cb, free_cb) or similar, then use locking and let any
event loop handle the signals and distribute signal events to the relevant
loops. But I am not sure how much locking would be required and I assume
it would be a lot, as one would need to handle the case where one thread
handles a signal for an event_base currently in use by another thread.

Looking at the code in libevent, it seems that signals get handled by
whatever loop was started last, so signal handling is not reliable at all
unless one registers the signal handlers in all threads, which is hard to
do in a thread-safe manner (for the user code).

Having a deterministic model where one loop handles all that would definitely
an improvement over this.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users