Re: request for comments: multiple-connections-per-thread MPM design
On Thu, Dec 12, 2002 at 12:39:17AM -0800, Manoj Kasichainula wrote: ... > > Add a descriptor (pipe, socket, whatever) to the pollset and use > > it to indicate the need to generate a new pollset. The thread that sends > > info down this descriptor could be programmed to wait a short amount of > > time between sending triggers, so as not to cause the select() to return > > too, too often, but short enough not to delay the handling of new > > connections too long. > > But what's a good value? ... > Hmmm, if the poll is waiting on fds for any length of time, it should be > ok to interrupt it, because by definition it's not doing anything else. > > So maybe the way to go is to forget about waiting the 0.1 s to interrupt > poll. Just notify it immediately when there's a fd waiting to be polled. > If no other fds have work to provide, we add the new fds to the poll set > and continue. > Otherwise, just run through all the other fds that need handling first, > then pick off all the fds that are waiting for polling and add them to > the fd set. > > So (again using terms from my proposal): > > submit_ticket would push fds into a queue and write to new_event_pipe if > the queue was empty when pushing. > > get_next_event would do something like: > > if (previous_poll_fds_remaining) { > pick one off, call event handler for it > } > else { > clean out new_event_queue and put values into new poll set > poll(pollfds, io_timeout); > call event handler for one of the returned pollfds > } ... +1 on concept with comments: Each time poll returns to handle ready fds, it should skip new_event_pipe (it should not send than fd to an event handler), and it should check new_event_queue for fds to add to the pollset before it returns to polling. It should always be doing useful work or should be blocking in select(), because it will always have at least one fd -- it's end of new_event_pipe -- in its pollset. Coding to interrupt the poll immediately is the first thing to do, and then a max short delay can be added to submit_ticket only if necessary. As you said, the max short delay would only affect the unbusy case where the poll is waiting on all current members of the pollset. The short delay had been suggested to prevent interrupting select() before select() had a chance to do any useful work. We won't know if this is a real or imagined problem until it is tested. It sounds like it won't be a performance problem, although using the max short timer of even 0.05s might slightly reduce the CPU usage of these threads when under heavy load. -Glenn
Re: request for comments: multiple-connections-per-thread MPM design
Took too long to respond. Oh well, no one else did either... On Tue, Nov 26, 2002 at 01:14:10AM -0500, Glenn wrote: > On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote: > > BTW, ISTR Ryan commenting a while back that cross-thread signalling > > isn't reliable, and it scares me in general, so I'd lean towards the > > pipe. > > > > I'm pondering what else could be done about this; having to muck with a > > pipe doesn't feel like the right thing to do. > > Why not? Good question. I'm still waffling on this. > Add a descriptor (pipe, socket, whatever) to the pollset and use > it to indicate the need to generate a new pollset. The thread that sends > info down this descriptor could be programmed to wait a short amount of > time between sending triggers, so as not to cause the select() to return > too, too often, but short enough not to delay the handling of new > connections too long. But what's a good value? Any value picked is going to be too annoying. 0.1 s means delaying lots of threads up to a tenth of a second. And there would be good reasons for wanting to lower that value, and to not lower that value. Which would mean it would need to be a tunable parameter depending on network and CPU characteristics, and needing a tunable parameter for this just seems ooky. But just picking a good value and sticking with it might not be too bad. The correct thing to do would be to code it up and test, but I'd rather have a reasonable idea of the chances for success first. :) In the perfect case, each poll call would return immediately with lots of file descriptors ready for work, and they would all get farmed out. Then before the next poll runs, there are more file descriptors ready to be polled. Hmmm, if the poll is waiting on fds for any length of time, it should be ok to interrupt it, because by definition it's not doing anything else. So maybe the way to go is to forget about waiting the 0.1 s to interrupt poll. Just notify it immediately when there's a fd waiting to be polled. If no other fds have work to provide, we add the new fds to the poll set and continue. Otherwise, just run through all the other fds that need handling first, then pick off all the fds that are waiting for polling and add them to the fd set. So (again using terms from my proposal): submit_ticket would push fds into a queue and write to new_event_pipe if the queue was empty when pushing. get_next_event would do something like: if (previous_poll_fds_remaining) { pick one off, call event handler for it } else { clean out new_event_queue and put values into new poll set poll(pollfds, io_timeout); call event handler for one of the returned pollfds } Something was bothering me about this earlier, and I can't remember what it is. Maybe it's that when the server isn't busy, a single ticket submission will make 2 threads (the ticket submitter and the thread holding the poll mutex) do stuff. Maybe even 3 threads since a new thread could take the poll mutex. But since this is the unbusy case, it's not quite so bad.
Re: request for comments: multiple-connections-per-thread MPM design
Juan Rivera wrote: Brian, Does your model assume that connections are short lived (HTTP)? One problem with the current model is that if you implement, let's say mod_socks, it holds a connection per thread. Is that something your model addresses? I'm looking into a pure async i/o model which addresses this problem but has bigger compatibility issues, yours may not. My model supports long-lived connections with short individual transactions (HTTP with keepalives, for example). I think Manoj's design could support a broader range of protocols, because it doesn't associate specific protocol handling states (read request, prepare response, send response) with dedicated thread pools the way my design does. Brian
RE: request for comments: multiple-connections-per-thread MPM design
Title: RE: request for comments: multiple-connections-per-thread MPM design Brian, Does your model assume that connections are short lived (HTTP)? One problem with the current model is that if you implement, let's say mod_socks, it holds a connection per thread. Is that something your model addresses? I'm looking into a pure async i/o model which addresses this problem but has bigger compatibility issues, yours may not. Best regards, Juan C. Rivera Citrix Systems, Inc. -Original Message- From: Brian Pane [mailto:[EMAIL PROTECTED]] Sent: Saturday, November 23, 2002 9:41 PM To: [EMAIL PROTECTED] Subject: request for comments: multiple-connections-per-thread MPM design Here's an outline of my latest thinking on how to build a multiple-connections-per-thread MPM for Apache 2.2. I'm eager to hear feedback from others who have been researching this topic. Thanks, Brian Overview The design described here is a hybrid sync/async architecture: * Do the slow part of request processing--network reads and writes--in an event loop for scalability. * Do the fast part of request processing--everything other than network I/O--in a one-request-per-thread mode so that module developers don't have to rewrite all their code as reentrant state machines. Basic structure --- Each httpd child process has four thread pools: 1. Listener thread A Listener thread accept(2)s a connection, creates a conn_rec for it, and sends it to the Reader thread. 2. Reader thread A Reader thread runs a poll loop to watch for incoming data on all connections that have been passed to it by a Listener or Writer. It reads the next request from each connection, builds a request_rec, and passes the conn_rec and the request_rec on to the Request Processor thread pool. 3. Request Processor threads Each Request Processor thread handles one request_rec at a time. When it receives a request from the Reader thread, the Request Processor runs all the request processing hooks (auth, map to storage, handler, etc) except the logger, plus the output filter stack except the core_output_filter. As the Request Processor produces output brigades, it sends them to the Writer thread pool. Once the Request processor has finished handling the request, it sends the last of the output data, plus the request_rec, to the Writer. 4. Writer thread The Writer thread runs a poll loop to output the data for all connections that have been passed to it. When it finishes writing the response for a request, the Writer calls the logger, destroys the request_rec, and either executes the lingering_close on the connection or sends the connection back to the Reader, depending on whether the connection is a keep-alive. Component details - * Listener thread: This thread will need to use an accept_mutex to serialize the accept, just like 2.0 does. * Passing connections from Listener to Reader: When the Listener creates a new connection, it adds it to a global queue and writes one byte to a pipe. The other end of the pipe is in the Reader's pollset. When the poll(2) in the Reader completes, the Reader detects the data available on the pipe, reads and discards the byte, and retrieves all the new connections in the queue. * Passing connections from Reader to Request Processor: When the Reader has consumed all the data in a connection, it adds the connection and the newly created request_rec to a global queue and signals a condition variable. The idle Request Processor threads take turns waiting on the condition variable (leader/followers model). * Passing output brigades from Request Processor to Writer: Same model as the Listener-to-Reader handoff: add to a queue, and write a byte to a pipe. * Bucket management: Implicit in this design is the idea that the Writer thread can be writing part of an HTTP response while a Request Processor thread is still generating more buckets for that request. This is a good thing because it means that the Request Processor thread won't ever find itself blocked on a network write, so it can produce all its output quickly and move on to another request (which is the key to keeping the number of threads low). However, it does mean that we need a thread-safe solution for allocating and destroying buckets and brigades. * request_rec lifetime: When a Request Processor thread has produced all of the output for a response, it adds a metadata bucket to the last output brigade. This bucket points to the request_rec. Upon sending the last of the request's output, the Writer thread is responsible for calling the logger and the destroying the request and its pool. This would be a major change from how 1.
Re: request for comments: multiple-connections-per-thread MPM design
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote: > On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote: > > The real reason I don't like the mutex around the poll is that > > it would add too much latency if we had to wait for the current > > poll to complete before adding a new descriptor. When the > > Listener accepts a new connection, or a Request Processor creates > > a new response brigade, it needs to get the corresponding socket > > added to the pollset immediately, which really requires interrupting > > the current poll. > > Hmmm. That's a problem that needs solving even without the mutex though > (and it affects the design I proposed yesterday as well). When you're > adding a new fd to the reader or writer, you have to write to a pipe or > send a signal. The mutex shouldn't affect that. > > BTW, ISTR Ryan commenting a while back that cross-thread signalling > isn't reliable, and it scares me in general, so I'd lean towards the > pipe. > > I'm pondering what else could be done about this; having to muck with a > pipe doesn't feel like the right thing to do. Why not? Add a descriptor (pipe, socket, whatever) to the pollset and use it to indicate the need to generate a new pollset. The thread that sends info down this descriptor could be programmed to wait a short amount of time between sending triggers, so as not to cause the select() to return too, too often, but short enough not to delay the handling of new connections too long. And the select()er thread would need to add a quick step to check for this special descriptor instead of treating them all as external requests. It would also need to somehow signal the other thread each time select() returned so that waiting descriptors could be added immediately. Or am I smoking what Manoj is smoking? -Glenn
Re: request for comments: multiple-connections-per-thread MPM design
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Me at IO wrote: > I'm just guessing here, but I imagine most CPU effort wouldn't be > expended in the actual kernel<->user transitions that are polls and > non-blocking I/O. And the meat of those operations could be handled by > other CPUs at the kernel level. So that separation onto multiple > CPUs might not help much. Eh, I was on crack when I wrote this. You want an I/O thread per CPU when you can get it.
Re: Another async I/O proposal [was Re: request for comments: multiple-connections-per-thread MPM design]
On Mon, Nov 25, 2002 at 08:10:12AM -0800, Brian Pane wrote: > On Mon, 2002-11-25 at 00:02, Manoj Kasichainula wrote: > > while (event = get_next_event()) > >add more spare threads if needed > >event_processor = lookup_event_processor(event) > >ticket = event_processor(event) > >if (ticket) submit_ticket(ticket) > >exit loop (and thus end thread) if not needed > > > > The event_processor can take as long as it wants, since there are other > > threads who can wait for the next event. > > Where is the locking done? Is the lock just around the > get_next_event() call? Yeah, I imagined the locking would be implicit in there. Different event mechanisms on various OSes could require different locking schemes, so if locking is needed, it should be hidden there. > Once the httpd_request_processor() has created a new ticket for > the write, how does the submit_ticket() get the socket added into > the pollset? If it's possible for another request to be in the > middle of a poll call at the same time, does submit_ticket() > interrupt the poll in order to add the new descriptor? This is a problem I missed somehow. I mentioned it in the other branch of the thread. > - Flow control will be difficult. Here's a tricky scenario I > just thought of: The server is configured to run 10 threads. > Most of the time, it only needs a couple of them, because it's > serving mostly static content and and occasional PHP request. > Suddenly, it gets a flood of requests for PHP pages. The first > ten of these quickly take over the ten available threads. > PHP doesn't know how to work in an event-driven world, each > of these requests holds onto its thread for a long time. When > one of them finally completes, it produces some content to be > written. But the writes may be starved, because the first > thread that finishes its PHP request and goes back into the > select loop might find another incoming request and read it > before doing any writes. And if that new request is another > long-running PHP request, it could be a while before we finally > get around to doing the write. Hmm, yeah, this is a concern. One answer is to set a very high MaxThreadLimit, but then you can't control how many PHP threads you have. Another answer is to reserve some threads for I/O, which your design does. > It's possible to partly work around this by implementing > get_next_event() so that it completes all pending, unblocked > writes before returning. But more generally, we'll need some > solution to keep long-running, non-event-based requests from > taking over all the server threads. (This is true of my design > as well.) Actually, in your design, since you have seperate threads for I/O, I don't see why it would suffer.
Re: request for comments: multiple-connections-per-thread MPM design
On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote: > On Mon, 2002-11-25 at 00:20, Manoj Kasichainula wrote: > > I was actually wondering why the reader and writer were seperate > > threads. > > It was a combination of several factors that convinced me > to make them separate: > * Take advantage of multiple CPUs more easily Yeah, but as you noticed, once you get more than 2 CPUs, you have the same problem. I'm just guessing here, but I imagine most CPU effort wouldn't be expended in the actual kernel<->user transitions that are polls and non-blocking I/O. And the meat of those operations could be handled by other CPUs at the kernel level. So that separation onto multiple CPUs might not help much. > * Reduce the number of file descriptors that each poll call > is handling (important on platforms where we don't have > an efficient poll mechanism) Has anyone read or benchmarked whether 2 threads polling 500 fds is faster than 1 thread polling 1000? > > For Linux 2.6, file notifications could be done entirely in userland in > > the case where no blocking is needed, using "futexes". > > Thanks! I'll check out futexes. Note that futexes are just Fast User mUTEXES. Those are already in the kernel (according to some threads I read yesterday anyway). But I beleive the part about file notification using them is still in discussion. > > But if you want to avoid the extra system calls, you could put a mutex > > around maintenence of the pollset and just let the various threads dork > > with it directly. > > > > I do keep mentioning this mutex around the select/poll :). Is there a > > performance reason that you're trying to avoid it? In my past skimmings, > > I've seen you post a lot of benchmarks and such, so maybe you've studied > > this. > > The real reason I don't like the mutex around the poll is that > it would add too much latency if we had to wait for the current > poll to complete before adding a new descriptor. When the > Listener accepts a new connection, or a Request Processor creates > a new response brigade, it needs to get the corresponding socket > added to the pollset immediately, which really requires interrupting > the current poll. Hmmm. That's a problem that needs solving even without the mutex though (and it affects the design I proposed yesterday as well). When you're adding a new fd to the reader or writer, you have to write to a pipe or send a signal. The mutex shouldn't affect that. BTW, ISTR Ryan commenting a while back that cross-thread signalling isn't reliable, and it scares me in general, so I'd lean towards the pipe. I'm pondering what else could be done about this; having to muck with a pipe doesn't feel like the right thing to do. Perhaps I should actually look at other people's code to see what they do. Other designs have threads for disk I/O and such, so there should be a way. I believe Windows doesn't have this problem, or at least hides it better, because completion ports are independent entities that don't interact with each other as far as the user is concerned.
Re: request for comments: multiple-connections-per-thread MPM design
On Sat, Nov 23, 2002 at 06:40:58PM -0800, Brian Pane wrote: > Here's an outline of my latest thinking on how to build a > multiple-connections-per-thread MPM for Apache 2.2. I'm > eager to hear feedback from others who have been researching > this topic. You prodded me into finally writing up a proposal that's been bouncing around in my head for a while now. That was in a seperate message, this will be suggestions for your proposal. > 1. Listener thread > A Listener thread accept(2)s a connection, creates > a conn_rec for it, and sends it to the Reader thread. Some (Most?) protocols have the server initiate the protocol negotatiation instead of the client, so the listener needs to be able to pass off to the writer thread as well. > * Limiting the Reader and Writer pools to one thread each will > simplify the design and implementation. But will this impair > our ability to take advantage of lots of CPUs? I was actually wondering why the reader and writer were seperate threads. What gets more complex with a thread pool > 1? I know we'd have to add a mutex around the select+(read|write), but is there something else? > * Can we eliminate the listener thread? It would be faster to just > have the Reader thread include the listen socket(s) in its pollset. > But if we did that, we'd need some new way to synchronize the > accept handling among multiple child processes, because we can't > have the Reader thread blocking on an accept mutex when it has > existing connections to watch. You could dispense with the listener thread in the single-process case and just use an intraprocess mutex around select+(accept|read|write) > * Is there a more efficient way to interrupt a thread that's > blocked in a poll call? That's a crucial step in the Listener-to- > Reader and Request Processor-to-Writer handoffs. Writing a byte > to a pipe requires two extra syscalls (a read and a write) per > handoff. Sending a signal to the target thread is the only > other solution I can think of at the moment, but that's bad > because the target thread might be in the middle of a read > or write call, rather than a poll, at the moment when we hit > it with a signal, so the read or write will fail with EINTR. For Linux 2.6, file notifications could be done entirely in userland in the case where no blocking is needed, using "futexes". But if you want to avoid the extra system calls, you could put a mutex around maintenence of the pollset and just let the various threads dork with it directly. I do keep mentioning this mutex around the select/poll :). Is there a performance reason that you're trying to avoid it? In my past skimmings, I've seen you post a lot of benchmarks and such, so maybe you've studied this. I'm suspicious of signals, but as long as they are tightly controlled with sigprocmask or pthread_sigmask, I guess they aren't so bad.
Another async I/O proposal [was Re: request for comments: multiple-connections-per-thread MPM design]
I have some suggestions for Brian's design proposal which I'm pondering and writing up in another message, but meanwhile, I have an alternate proposal that I've been rolling around inside my head for months now, so I figured I might as well write it up. It involves (mostly) a single pool of threads all running through an event loop. I think the below could be written as a single MPM for a specific operating system, or a generic MPM optimized for many OSes, or just APR. It is also a hybrid sync/async approach, but most aspects of the approach can be handled by a single thread pool instead of multiple. Please punch holes in this proposal at will. Definitions --- Ticket - something to do, e.g. [READ, fd], [LISTEN, fd], [WRITE, fd, buckets]. It's a request for the main event loop to give us back an event. Event - something that has been done (with some of the data used in it) and its result, e.g. [READ, buckets], [LISTEN, fd], [WRITE], etc. Both of the above include contexts for state maintenance of course. Event processor - receives events, processes them, decides on consequences, and returns a new ticket to handle, or NULL if there is none Design -- We have a single pool of threads, growing and shrinking as needed, in a standard event-handling loop: while (event = get_next_event()) add more spare threads if needed event_processor = lookup_event_processor(event) ticket = event_processor(event) if (ticket) submit_ticket(ticket) exit loop (and thus end thread) if not needed The event_processor can take as long as it wants, since there are other threads who can wait for the next event. Tickets could be handled in multiple disjoint iterations of the event loop, but the event processors never see this. This is how Windows can process a WRITE ticket for a file bucket with TransmitFile w/ completion ports, Linux can (IIRC) use a non-blocking sendfile loop, and an old-school unix can use a read-write loop. Note that I did mention platform-specific code; does APR know how to do async and nonblocking I/O for various platforms in the optimal way? If not, this loop could. submit_ticket and get_next_event work together to provide the smarts of the loop. On old-school unix, submit_ticket would take a ticket and set up the fd_set, and get_next_event would select() on the fd_set and do what's appropriate, which doesn't always involve a quick system call and a return of an event. For example, while handling a WRITE ticket, we might only be able to partially complete the write without blocking. In that case, get_next_event could rejigger the fd_set and go back to the select() call. HTTP's event_processors, in a simple case where all handlers read HTTP request data, process it, then return looks sort of like: http_listen_processor = http_request_processor http_request_processor(event) input_buckets += get_buckets(event) if (need_more_for_this_request) return new_read_ticket(fd, http_request_processor, context) else /* Next line can take a long time and can be written in a * blocking fashion */ output_buckets = request_handler(fd, input_buckets) return new_write_ticket(fd, output_buckets, http_keepalive_processor, context) http_keepalive_processor(event) if (keepalive) return NULL else return new_read_ticket(fd, http_request_processor, context) If we want to allow it, the request_handler() call above could even do its own reading and writing of the file descriptor. In the single process case on old-school Unix, submit_ticket can just tell get_next_event to select+accept w/ a simple mutex around them. In the multiple process case, it can wait on a queue for an outside listener thread like in Brian's description. And in some Unixes (and I believe Windows with completion ports), the multiprocess case isn't a concern. Linux 2.6 could use epoll and avoid all these issues, and 2.4 has a realtime signal interface to do the same thing I believe. I've glossed over where the conn_recs and request_recs get built. That's mainly because I don't know how the multi-protocol stuff deals with request_recs :). I would expect conn_recs to be completely generic, and request_recs to be somewhat or completely http-specific. Generic portions could go into the main event loop, HTTP portions go into the http event processors. Disadvantages of this proposal I can think of offhand: - Because threads are mostly in one large pool, some common structures have to be protected through a mutex. I like paying for mutexes more than paying for context switches though. - We're creating a destroying a lot of "objects" (tickets and events). I don't think there'll be much overhead since these aren't real OO objects, but we have to be careful Advantages: - Async I/O, introduced gradually throughout the server. At first, this can just be yet another MPM, with no change to the rest of the server.
request for comments: multiple-connections-per-thread MPM design
Here's an outline of my latest thinking on how to build a multiple-connections-per-thread MPM for Apache 2.2. I'm eager to hear feedback from others who have been researching this topic. Thanks, Brian Overview The design described here is a hybrid sync/async architecture: * Do the slow part of request processing--network reads and writes--in an event loop for scalability. * Do the fast part of request processing--everything other than network I/O--in a one-request-per-thread mode so that module developers don't have to rewrite all their code as reentrant state machines. Basic structure --- Each httpd child process has four thread pools: 1. Listener thread A Listener thread accept(2)s a connection, creates a conn_rec for it, and sends it to the Reader thread. 2. Reader thread A Reader thread runs a poll loop to watch for incoming data on all connections that have been passed to it by a Listener or Writer. It reads the next request from each connection, builds a request_rec, and passes the conn_rec and the request_rec on to the Request Processor thread pool. 3. Request Processor threads Each Request Processor thread handles one request_rec at a time. When it receives a request from the Reader thread, the Request Processor runs all the request processing hooks (auth, map to storage, handler, etc) except the logger, plus the output filter stack except the core_output_filter. As the Request Processor produces output brigades, it sends them to the Writer thread pool. Once the Request processor has finished handling the request, it sends the last of the output data, plus the request_rec, to the Writer. 4. Writer thread The Writer thread runs a poll loop to output the data for all connections that have been passed to it. When it finishes writing the response for a request, the Writer calls the logger, destroys the request_rec, and either executes the lingering_close on the connection or sends the connection back to the Reader, depending on whether the connection is a keep-alive. Component details - * Listener thread: This thread will need to use an accept_mutex to serialize the accept, just like 2.0 does. * Passing connections from Listener to Reader: When the Listener creates a new connection, it adds it to a global queue and writes one byte to a pipe. The other end of the pipe is in the Reader's pollset. When the poll(2) in the Reader completes, the Reader detects the data available on the pipe, reads and discards the byte, and retrieves all the new connections in the queue. * Passing connections from Reader to Request Processor: When the Reader has consumed all the data in a connection, it adds the connection and the newly created request_rec to a global queue and signals a condition variable. The idle Request Processor threads take turns waiting on the condition variable (leader/followers model). * Passing output brigades from Request Processor to Writer: Same model as the Listener-to-Reader handoff: add to a queue, and write a byte to a pipe. * Bucket management: Implicit in this design is the idea that the Writer thread can be writing part of an HTTP response while a Request Processor thread is still generating more buckets for that request. This is a good thing because it means that the Request Processor thread won't ever find itself blocked on a network write, so it can produce all its output quickly and move on to another request (which is the key to keeping the number of threads low). However, it does mean that we need a thread-safe solution for allocating and destroying buckets and brigades. * request_rec lifetime: When a Request Processor thread has produced all of the output for a response, it adds a metadata bucket to the last output brigade. This bucket points to the request_rec. Upon sending the last of the request's output, the Writer thread is responsible for calling the logger and the destroying the request and its pool. This would be a major change from how 1.x and 2.0 work. The rationale for it is twofold: - Eliminate the need to set aside buckets from the request pool into the connection pool in the core_output_filter, which has been a source of many bugs in 2.0. - Allow for more accurate logging of bytes_sent (e.g., in mod_logio) by delaying the logger until the request has actually been sent. One implication of this change is that the request pool could no longer be a sub-pool of the connection pool, unless we make subpool creation a thread-safe operation. Open questions -- * Limiting the Reader and Writer pools to one thread each will simplify the design and implementation. But will this impair our ability to take advantage of lots of CPUs? * Ca