Re: request for comments: multiple-connections-per-thread MPM design

2002-12-12 Thread Manoj Kasichainula
Took too long to respond. Oh well, no one else did either...

On Tue, Nov 26, 2002 at 01:14:10AM -0500, Glenn wrote:
 On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote:
  BTW, ISTR Ryan commenting a while back that cross-thread signalling
  isn't reliable, and it scares me in general, so I'd lean towards the
  pipe.
  
  I'm pondering what else could be done about this; having to muck with a
  pipe doesn't feel like the right thing to do.
 
 Why not?

Good question. I'm still waffling on this.

 Add a descriptor (pipe, socket, whatever) to the pollset and use
 it to indicate the need to generate a new pollset.  The thread that sends
 info down this descriptor could be programmed to wait a short amount of
 time between sending triggers, so as not to cause the select() to return
 too, too often, but short enough not to delay the handling of new
 connections too long.

But what's a good value? Any value picked is going to be too annoying.
0.1 s means delaying lots of threads up to a tenth of a second. And
there would be good reasons for wanting to lower that value, and to not
lower that value. Which would mean it would need to be a tunable
parameter depending on network and CPU characteristics, and needing a
tunable parameter for this just seems ooky. 

But just picking a good value and sticking with it might not be too bad.
The correct thing to do would be to code it up and test, but I'd rather
have a reasonable idea of the chances for success first. :)

In the perfect case, each poll call would return immediately with lots
of file descriptors ready for work, and they would all get farmed out.
Then before the next poll runs, there are more file descriptors ready to
be polled. 

Hmmm, if the poll is waiting on fds for any length of time, it should be
ok to interrupt it, because by definition it's not doing anything else.

So maybe the way to go is to forget about waiting the 0.1 s to interrupt
poll. Just notify it immediately when there's a fd waiting to be polled.
If no other fds have work to provide, we add the new fds to the poll set
and continue.

Otherwise, just run through all the other fds that need handling first,
then pick off all the fds that are waiting for polling and add them to
the fd set.

So (again using terms from my proposal):

submit_ticket would push fds into a queue and write to new_event_pipe if
the queue was empty when pushing.

get_next_event would do something like:

if (previous_poll_fds_remaining) {
pick one off, call event handler for it
}
else {
clean out new_event_queue and put values into new poll set
poll(pollfds, io_timeout);
call event handler for one of the returned pollfds
}

Something was bothering me about this earlier, and I can't remember what
it is. Maybe it's that when the server isn't busy, a single ticket
submission will make 2 threads (the ticket submitter and the thread
holding the poll mutex) do stuff. Maybe even 3 threads since a new
thread could take the poll mutex. But since this is the unbusy case,
it's not quite so bad.




Re: request for comments: multiple-connections-per-thread MPM design

2002-12-12 Thread Glenn
On Thu, Dec 12, 2002 at 12:39:17AM -0800, Manoj Kasichainula wrote:
...
  Add a descriptor (pipe, socket, whatever) to the pollset and use
  it to indicate the need to generate a new pollset.  The thread that sends
  info down this descriptor could be programmed to wait a short amount of
  time between sending triggers, so as not to cause the select() to return
  too, too often, but short enough not to delay the handling of new
  connections too long.
 
 But what's a good value?
...
 Hmmm, if the poll is waiting on fds for any length of time, it should be
 ok to interrupt it, because by definition it's not doing anything else.
 
 So maybe the way to go is to forget about waiting the 0.1 s to interrupt
 poll. Just notify it immediately when there's a fd waiting to be polled.
 If no other fds have work to provide, we add the new fds to the poll set
 and continue.
 Otherwise, just run through all the other fds that need handling first,
 then pick off all the fds that are waiting for polling and add them to
 the fd set.
 
 So (again using terms from my proposal):
 
 submit_ticket would push fds into a queue and write to new_event_pipe if
 the queue was empty when pushing.
 
 get_next_event would do something like:
 
 if (previous_poll_fds_remaining) {
 pick one off, call event handler for it
 }
 else {
 clean out new_event_queue and put values into new poll set
 poll(pollfds, io_timeout);
 call event handler for one of the returned pollfds
 }
...

+1 on concept with comments:
Each time poll returns to handle ready fds, it should skip new_event_pipe
(it should not send than fd to an event handler), and it should check
new_event_queue for fds to add to the pollset before it returns to polling.

It should always be doing useful work or should be blocking in select(),
because it will always have at least one fd -- it's end of new_event_pipe --
in its pollset.


Coding to interrupt the poll immediately is the first thing to do, and
then a max short delay can be added to submit_ticket only if necessary.

As you said, the max short delay would only affect the unbusy case where
the poll is waiting on all current members of the pollset.  The short
delay had been suggested to prevent interrupting select() before select()
had a chance to do any useful work.  We won't know if this is a real or
imagined problem until it is tested.  It sounds like it won't be a
performance problem, although using the max short timer of even 0.05s might
slightly reduce the CPU usage of these threads when under heavy load.

-Glenn



Re: request for comments: multiple-connections-per-thread MPM design

2002-11-27 Thread Brian Pane
Juan Rivera wrote:


Brian,

Does your model assume that connections are short lived (HTTP)?

One problem with the current model is that if you implement, let's say 
mod_socks, it holds a connection per thread.

Is that something your model addresses?

I'm looking into a pure async i/o model which addresses this problem 
but has bigger compatibility issues, yours may not.


My model supports long-lived connections with
short individual transactions (HTTP with keepalives,
for example).  I think Manoj's design could support
a broader range of protocols, because it doesn't
associate specific protocol handling states (read
request, prepare response, send response) with
dedicated thread pools the way my design does.

Brian





RE: request for comments: multiple-connections-per-thread MPM design

2002-11-26 Thread Juan Rivera
Title: RE: request for comments: multiple-connections-per-thread MPM design





Brian,


Does your model assume that connections are short lived (HTTP)?


One problem with the current model is that if you implement, let's say mod_socks, it holds a connection per thread.


Is that something your model addresses?


I'm looking into a pure async i/o model which addresses this problem but has bigger compatibility issues, yours may not.


Best regards,


Juan C. Rivera
Citrix Systems, Inc.



-Original Message-
From: Brian Pane [mailto:[EMAIL PROTECTED]] 
Sent: Saturday, November 23, 2002 9:41 PM
To: [EMAIL PROTECTED]
Subject: request for comments: multiple-connections-per-thread MPM design


Here's an outline of my latest thinking on how to build a
multiple-connections-per-thread MPM for Apache 2.2. I'm
eager to hear feedback from others who have been researching
this topic.


Thanks,
Brian



Overview

The design described here is a hybrid sync/async architecture:


* Do the slow part of request processing--network reads and
 writes--in an event loop for scalability.


* Do the fast part of request processing--everything other
 than network I/O--in a one-request-per-thread mode so that
 module developers don't have to rewrite all their code as
 reentrant state machines.



Basic structure
---


Each httpd child process has four thread pools:


1. Listener thread
 A Listener thread accept(2)s a connection, creates
 a conn_rec for it, and sends it to the Reader thread.


2. Reader thread
 A Reader thread runs a poll loop to watch for incoming
 data on all connections that have been passed to it by a
 Listener or Writer. It reads the next request from each
 connection, builds a request_rec, and passes the conn_rec
 and the request_rec on to the Request Processor thread
 pool.


3. Request Processor threads
 Each Request Processor thread handles one request_rec
 at a time. When it receives a request from the Reader
 thread, the Request Processor runs all the request
 processing hooks (auth, map to storage, handler, etc)
 except the logger, plus the output filter stack except
 the core_output_filter. As the Request Processor produces
 output brigades, it sends them to the Writer thread pool.
 Once the Request processor has finished handling the
 request, it sends the last of the output data, plus
 the request_rec, to the Writer.


4. Writer thread
 The Writer thread runs a poll loop to output the data
 for all connections that have been passed to it. When
 it finishes writing the response for a request, the
 Writer calls the logger, destroys the request_rec,
 and either executes the lingering_close on the connection
 or sends the connection back to the Reader, depending on
 whether the connection is a keep-alive.



Component details
-


* Listener thread: This thread will need to use an accept_mutex
 to serialize the accept, just like 2.0 does.


* Passing connections from Listener to Reader: When the
 Listener creates a new connection, it adds it to a global
 queue and writes one byte to a pipe. The other end of the
 pipe is in the Reader's pollset. When the poll(2) in the
 Reader completes, the Reader detects the data available on
 the pipe, reads and discards the byte, and retrieves all
 the new connections in the queue.


* Passing connections from Reader to Request Processor: When
 the Reader has consumed all the data in a connection, it
 adds the connection and the newly created request_rec to
 a global queue and signals a condition variable. The
 idle Request Processor threads take turns waiting on the
 condition variable (leader/followers model).


* Passing output brigades from Request Processor to Writer:
 Same model as the Listener-to-Reader handoff: add to a
 queue, and write a byte to a pipe.


* Bucket management: Implicit in this design is the idea that
 the Writer thread can be writing part of an HTTP response
 while a Request Processor thread is still generating more
 buckets for that request. This is a good thing because it
 means that the Request Processor thread won't ever find itself
 blocked on a network write, so it can produce all its output
 quickly and move on to another request (which is the key to
 keeping the number of threads low). However, it does mean
 that we need a thread-safe solution for allocating and
 destroying buckets and brigades.


* request_rec lifetime: When a Request Processor thread has
 produced all of the output for a response, it adds a metadata
 bucket to the last output brigade. This bucket points to the
 request_rec. Upon sending the last of the request's output,
 the Writer thread is responsible for calling the logger and
 the destroying the request and its pool. This would be a major
 change from how 1.x and 2.0 work. The rationale for it is
 twofold:
 - Eliminate the need to set aside buckets from the request
 pool into the connection pool in the core_output_filter,
 which has been a source of many bugs in 2.0

Re: request for comments: multiple-connections-per-thread MPM design

2002-11-25 Thread Manoj Kasichainula
On Sat, Nov 23, 2002 at 06:40:58PM -0800, Brian Pane wrote:
 Here's an outline of my latest thinking on how to build a
 multiple-connections-per-thread MPM for Apache 2.2.  I'm
 eager to hear feedback from others who have been researching
 this topic.

You prodded me into finally writing up a proposal that's been bouncing
around in my head for a while now. That was in a seperate message, this
will be suggestions for your proposal.

 1. Listener thread
   A Listener thread accept(2)s a connection, creates
   a conn_rec for it, and sends it to the Reader thread.

Some (Most?) protocols have the server initiate the protocol
negotatiation instead of the client, so the listener needs to be able to
pass off to the writer thread as well.

 * Limiting the Reader and Writer pools to one thread each will
   simplify the design and implementation.  But will this impair
   our ability to take advantage of lots of CPUs?

I was actually wondering why the reader and writer were seperate
threads.

What gets more complex with a thread pool  1? I know we'd have to add a
mutex around the select+(read|write), but is there something else?

 * Can we eliminate the listener thread?  It would be faster to just
   have the Reader thread include the listen socket(s) in its pollset.
   But if we did that, we'd need some new way to synchronize the
   accept handling among multiple child processes, because we can't
   have the Reader thread blocking on an accept mutex when it has
   existing connections to watch.

You could dispense with the listener thread in the single-process case
and just use an intraprocess mutex around select+(accept|read|write)

 * Is there a more efficient way to interrupt a thread that's
   blocked in a poll call?  That's a crucial step in the Listener-to-
   Reader and Request Processor-to-Writer handoffs.  Writing a byte
   to a pipe requires two extra syscalls (a read and a write) per
   handoff.  Sending a signal to the target thread is the only
   other solution I can think of at the moment, but that's bad
   because the target thread might be in the middle of a read
   or write call, rather than a poll, at the moment when we hit
   it with a signal, so the read or write will fail with EINTR.

For Linux 2.6, file notifications could be done entirely in userland in
the case where no blocking is needed, using futexes.

But if you want to avoid the extra system calls, you could put a mutex
around maintenence of the pollset and just let the various threads dork
with it directly.

I do keep mentioning this mutex around the select/poll :). Is there a
performance reason that you're trying to avoid it? In my past skimmings,
I've seen you post a lot of benchmarks and such, so maybe you've studied
this.

I'm suspicious of signals, but as long as they are tightly controlled
with sigprocmask or pthread_sigmask, I guess they aren't so bad.




Re: request for comments: multiple-connections-per-thread MPM design

2002-11-25 Thread Manoj Kasichainula
On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote:
 On Mon, 2002-11-25 at 00:20, Manoj Kasichainula wrote:
  I was actually wondering why the reader and writer were seperate
  threads.
 
 It was a combination of several factors that convinced me
 to make them separate:
 * Take advantage of multiple CPUs more easily

Yeah, but as you noticed, once you get more than 2 CPUs, you have the
same problem.

I'm just guessing here, but I imagine most CPU effort wouldn't be
expended in the actual kernel-user transitions that are polls and
non-blocking I/O.  And the meat of those operations could be handled by
other CPUs at the kernel level. So that separation onto multiple
CPUs might not help much.

 * Reduce the number of file descriptors that each poll call
   is handling (important on platforms where we don't have
   an efficient poll mechanism)

Has anyone read or benchmarked whether 2 threads polling 500 fds is
faster than 1 thread polling 1000?

  For Linux 2.6, file notifications could be done entirely in userland in
  the case where no blocking is needed, using futexes.
 
 Thanks!  I'll check out futexes.

Note that futexes are just Fast User mUTEXES. Those are already in the
kernel (according to some threads I read yesterday anyway). But I
beleive the part about file notification using them is still in
discussion.

  But if you want to avoid the extra system calls, you could put a mutex
  around maintenence of the pollset and just let the various threads dork
  with it directly.
  
  I do keep mentioning this mutex around the select/poll :). Is there a
  performance reason that you're trying to avoid it? In my past skimmings,
  I've seen you post a lot of benchmarks and such, so maybe you've studied
  this.
 
 The real reason I don't like the mutex around the poll is that
 it would add too much latency if we had to wait for the current
 poll to complete before adding a new descriptor.  When the
 Listener accepts a new connection, or a Request Processor creates
 a new response brigade, it needs to get the corresponding socket
 added to the pollset immediately, which really requires interrupting
 the current poll.

Hmmm. That's a problem that needs solving even without the mutex though
(and it affects the design I proposed yesterday as well).  When you're
adding a new fd to the reader or writer, you have to write to a pipe or
send a signal. The mutex shouldn't affect that. 

BTW, ISTR Ryan commenting a while back that cross-thread signalling
isn't reliable, and it scares me in general, so I'd lean towards the
pipe.

I'm pondering what else could be done about this; having to muck with a
pipe doesn't feel like the right thing to do. Perhaps I should actually
look at other people's code to see what they do. Other designs have
threads for disk I/O and such, so there should be a way. I believe
Windows doesn't have this problem, or at least hides it better, because
completion ports are independent entities that don't interact with each
other as far as the user is concerned.




Re: Another async I/O proposal [was Re: request for comments: multiple-connections-per-thread MPM design]

2002-11-25 Thread Manoj Kasichainula
On Mon, Nov 25, 2002 at 08:10:12AM -0800, Brian Pane wrote:
 On Mon, 2002-11-25 at 00:02, Manoj Kasichainula wrote:
  while (event = get_next_event())
 add more spare threads if needed
 event_processor = lookup_event_processor(event)
 ticket = event_processor(event)
 if (ticket) submit_ticket(ticket)
 exit loop (and thus end thread) if not needed
  
  The event_processor can take as long as it wants, since there are other
  threads who can wait for the next event.
 
 Where is the locking done?  Is the lock just around the
 get_next_event() call?

Yeah, I imagined the locking would be implicit in there. Different event
mechanisms on various OSes could require different locking schemes, so
if locking is needed, it should be hidden there.

 Once the httpd_request_processor() has created a new ticket for
 the write, how does the submit_ticket() get the socket added into
 the pollset?  If it's possible for another request to be in the
 middle of a poll call at the same time, does submit_ticket()
 interrupt the poll in order to add the new descriptor?

This is a problem I missed somehow. I mentioned it in the other branch
of the thread.

 - Flow control will be difficult.  Here's a tricky scenario I
   just thought of:  The server is configured to run 10 threads.
   Most of the time, it only needs a couple of them, because it's
   serving mostly static content and and occasional PHP request.
   Suddenly, it gets a flood of requests for PHP pages.  The first
   ten of these quickly take over the ten available threads.
   PHP doesn't know how to work in an event-driven world, each
   of these requests holds onto its thread for a long time.  When
   one of them finally completes, it produces some content to be
   written.  But the writes may be starved, because the first
   thread that finishes its PHP request and goes back into the
   select loop might find another incoming request and read it
   before doing any writes.  And if that new request is another
   long-running PHP request, it could be a while before we finally
   get around to doing the write.

Hmm, yeah, this is a concern. One answer is to set a very high
MaxThreadLimit, but then you can't control how many PHP threads you
have. Another answer is to reserve some threads for I/O, which your
design does.

   It's possible to partly work around this by implementing
   get_next_event() so that it completes all pending, unblocked
   writes before returning.  But more generally, we'll need some
   solution to keep long-running, non-event-based requests from
   taking over all the server threads.  (This is true of my design
   as well.)

Actually, in your design, since you have seperate threads for I/O, I
don't see why it would suffer.



Re: request for comments: multiple-connections-per-thread MPM design

2002-11-25 Thread Manoj Kasichainula
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Me at IO wrote:
 I'm just guessing here, but I imagine most CPU effort wouldn't be
 expended in the actual kernel-user transitions that are polls and
 non-blocking I/O.  And the meat of those operations could be handled by
 other CPUs at the kernel level. So that separation onto multiple
 CPUs might not help much.

Eh, I was on crack when I wrote this. You want an I/O thread per CPU
when you can get it.



Re: request for comments: multiple-connections-per-thread MPM design

2002-11-25 Thread Glenn
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote:
 On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote:
  The real reason I don't like the mutex around the poll is that
  it would add too much latency if we had to wait for the current
  poll to complete before adding a new descriptor.  When the
  Listener accepts a new connection, or a Request Processor creates
  a new response brigade, it needs to get the corresponding socket
  added to the pollset immediately, which really requires interrupting
  the current poll.
 
 Hmmm. That's a problem that needs solving even without the mutex though
 (and it affects the design I proposed yesterday as well).  When you're
 adding a new fd to the reader or writer, you have to write to a pipe or
 send a signal. The mutex shouldn't affect that. 
 
 BTW, ISTR Ryan commenting a while back that cross-thread signalling
 isn't reliable, and it scares me in general, so I'd lean towards the
 pipe.
 
 I'm pondering what else could be done about this; having to muck with a
 pipe doesn't feel like the right thing to do.

Why not?  Add a descriptor (pipe, socket, whatever) to the pollset and use
it to indicate the need to generate a new pollset.  The thread that sends
info down this descriptor could be programmed to wait a short amount of
time between sending triggers, so as not to cause the select() to return
too, too often, but short enough not to delay the handling of new
connections too long.  And the select()er thread would need to add a quick
step to check for this special descriptor instead of treating them all as
external requests.  It would also need to somehow signal the other thread
each time select() returned so that waiting descriptors could be added
immediately.

Or am I smoking what Manoj is smoking?

-Glenn