request for comments: multiple-connections-per-thread MPM design

Brian Pane Sat, 23 Nov 2002 18:37:40 -0800

Here's an outline of my latest thinking on how to build a
multiple-connections-per-thread MPM for Apache 2.2.  I'm
eager to hear feedback from others who have been researching
this topic.


Thanks,
Brian


Overview
--------
The design described here is a hybrid sync/async architecture:

* Do the slow part of request processing--network reads and
  writes--in an event loop for scalability.

* Do the fast part of request processing--everything other
  than network I/O--in a one-request-per-thread mode so that
  module developers don't have to rewrite all their code as
  reentrant state machines.


Basic structure
---------------

Each httpd child process has four thread pools:

1. Listener thread
      A Listener thread accept(2)s a connection, creates
      a conn_rec for it, and sends it to the Reader thread.

2. Reader thread
      A Reader thread runs a poll loop to watch for incoming
      data on all connections that have been passed to it by a
      Listener or Writer.  It reads the next request from each
      connection, builds a request_rec, and passes the conn_rec
      and the request_rec on to the Request Processor thread
      pool.

3. Request Processor threads
      Each Request Processor thread handles one request_rec
      at a time.  When it receives a request from the Reader
      thread, the Request Processor runs all the request
      processing hooks (auth, map to storage, handler, etc)
      except the logger, plus the output filter stack except
      the core_output_filter.  As the Request Processor produces
      output brigades, it sends them to the Writer thread pool.
      Once the Request processor has finished handling the
      request, it sends the last of the output data, plus
      the request_rec, to the Writer.

4. Writer thread
      The Writer thread runs a poll loop to output the data
      for all connections that have been passed to it.  When
      it finishes writing the response for a request, the
      Writer calls the logger, destroys the request_rec,
      and either executes the lingering_close on the connection
      or sends the connection back to the Reader, depending on
      whether the connection is a keep-alive.


Component details
-----------------

* Listener thread: This thread will need to use an accept_mutex
  to serialize the accept, just like 2.0 does.

* Passing connections from Listener to Reader:  When the
  Listener creates a new connection, it adds it to a global
  queue and writes one byte to a pipe.  The other end of the
  pipe is in the Reader's pollset.  When the poll(2) in the
  Reader completes, the Reader detects the data available on
  the pipe, reads and discards the byte, and retrieves all
  the new connections in the queue.

* Passing connections from Reader to Request Processor:  When
  the Reader has consumed all the data in a connection, it
  adds the connection and the newly created request_rec to
  a global queue and signals a condition variable.  The
  idle Request Processor threads take turns waiting on the
  condition variable (leader/followers model).

* Passing output brigades from Request Processor to Writer:
  Same model as the Listener-to-Reader handoff: add to a
  queue, and write a byte to a pipe.

* Bucket management:  Implicit in this design is the idea that
  the Writer thread can be writing part of an HTTP response
  while a Request Processor thread is still generating more
  buckets for that request.  This is a good thing because it
  means that the Request Processor thread won't ever find itself
  blocked on a network write, so it can produce all its output
  quickly and move on to another request (which is the key to
  keeping the number of threads low).  However, it does mean
  that we need a thread-safe solution for allocating and
  destroying buckets and brigades.

* request_rec lifetime:  When a Request Processor thread has
  produced all of the output for a response, it adds a metadata
  bucket to the last output brigade.  This bucket points to the
  request_rec.  Upon sending the last of the request's output,
  the Writer thread is responsible for calling the logger and
  the destroying the request and its pool.  This would be a major
  change from how 1.x and 2.0 work.  The rationale for it is
  twofold:
    - Eliminate the need to set aside buckets from the request
      pool into the connection pool in the core_output_filter,
      which has been a source of many bugs in 2.0.
    - Allow for more accurate logging of bytes_sent (e.g., in
      mod_logio) by delaying the logger until the request has
      actually been sent.
  One implication of this change is that the request pool could
  no longer be a sub-pool of the connection pool, unless we make
  subpool creation a thread-safe operation.


Open questions
--------------
* Limiting the Reader and Writer pools to one thread each will
  simplify the design and implementation.  But will this impair
  our ability to take advantage of lots of CPUs?

* Can we eliminate the listener thread?  It would be faster to just
  have the Reader thread include the listen socket(s) in its pollset.
  But if we did that, we'd need some new way to synchronize the
  accept handling among multiple child processes, because we can't
  have the Reader thread blocking on an accept mutex when it has
  existing connections to watch.

* Is there a more efficient way to interrupt a thread that's
  blocked in a poll call?  That's a crucial step in the Listener-to-
  Reader and Request Processor-to-Writer handoffs.  Writing a byte
  to a pipe requires two extra syscalls (a read and a write) per
  handoff.  Sending a signal to the target thread is the only
  other solution I can think of at the moment, but that's bad
  because the target thread might be in the middle of a read
  or write call, rather than a poll, at the moment when we hit
  it with a signal, so the read or write will fail with EINTR.

  Maybe the best solution would be a hybrid: using atomic
  operations, have the Reader maintain a flag that indicates
  whether it's blocked on a poll call or not.  If the Listener
  sees that the reader is blocked in a poll, it sends a signal
  to the listener to interrupt the poll; otherwise, it just
  adds the new connection to the queue and expects the listener
  to check the queue again before its next poll call.

* Do any major modules have a need to do blocking I/O or
  expensive computation within their input handlers?  That
  would cause problems for the single Reader thread, which
  depends on input handlers running quickly so it can get
  back to its poll loop.

request for comments: multiple-connections-per-thread MPM design

Reply via email to