> I've been thinking about strategies for building a
> multiple-connection-per-thread MPM for 2.0.  It's
> conceptually easy to do this:
>
>   * Start with worker.
>
>   * Keep the model of one worker thread per request,
>     so that blocking or CPU-intensive modules don't
>     need to be rewritten as state machines.
>
>   * In the core output filter, instead of doing
>     actual socket writes, hand off the output
>     brigades to a "writer thread."

How about implementing the event loop in a filter above the
core_output_filter (in net_time_filter perhaps)?  We need to formalize the
rules for writing a filter to handle receiving APR_EAGAIN (or
APR_EWOULDBLOCK) and maybe even APR_IO_PENDING (for a true async io), so
lets start with core_output_filter. (I am thinking of the time when we
decide to move the event_filter higher in the filter stack, like before the
chunk filter for instance).

>
>   * As soon as the worker thread has sent an EOS
>     to the writer thread, let the worker thread
>     move on to the next request.
>
>   * In the writer thread, use a big event loop
>     (with /dev/poll or RT signals or kqueue, depending
>     on platform) to do nonblocking writes for all
>     open connections.
>
> This would allow us to use a much smaller number of
> worker threads for the same amount amount of traffic
> (at least for typical workloads in which the network
> write time constitutes the majority of each requests's
> duration).

This is the right (write:-) optimization.  Looking at server-status on
apache.org shows most processes are busy writing responses.

>
> The problem, though, is that passing brigades between
> threads is unsafe:
>
>   * The bucket allocator alloc/free code isn't
>     thread-safe, so bad things will happen if the
>     writer thread tries to free a bucket (that's
>     just been written to the client) at the same
>     time that a worker thread is allocating a new
>     bucket for a subsequent request on the same
>     connection.
>
>   * If we delete the request pool when the worker
>     thread finishes its work on the request, the
>     pool cleanup will close the underlying objects
>     for the request's file/pipe/mmap/etc buckets.
>     When the writer thread tries to output these
>     buckets, the writes will fail.
>
> There are other ways to structure an async MPM, but
> in almost all cases we'll face the same problem:
> buckets that get created by one thread must be
> delivered and then freed by a different thread, and
> the current memory management design can't handle
> that.
>
> The cleanest solution I've thought of so far is:
>
>   * Modify the bucket allocator code to allow
>     thread-safe alloc/free of buckets.  For the
>     common cases, it should be possible to do
>     this without mutexes by using apr_atomic_cas()
>     based spin loops.  (There will be at most two
>     threads contending for the same allocator--
>     one worker thread and the writer thread--so
>     the amount of spinning should be minimal.)
>
>   * Don't delete the request pool at the end of
>     a request.  Instead, delay its deletion until
>     the last bucket from that request is sent.
>     One way to do this is to create a new metadata
>     bucket type that stores the pointer to the
>     request pool.  The worker thread can append
>     this metadata bucket to the output brigade,
>     right before the EOS.  The writer thread then
>     reads the metadata bucket and deletes (or
>     clears and recycles) the referenced pool after
>     sending the response.  This would mean, however,
>     that the request pool couldn't be a subpool of
>     the connection pool.  The writer thread would have
>     to be careful to clean up the request pool(s)
>     upon connection abort.
>
> I'm eager to hear comments from others who have looked
> at the async design issues.
>

When a brigade containg an EOS is passed to the event loop by a worker
thread, ownership of the brigade and all it's buckets is passed to the event
loop and the worker should not even touch memory in any of the buckets. The
worker cannot help but touch the brigade though (as part of unwinding the
call chain).  The key here I think is to reference count the brigade (or a
struct that references the brigade). The brigade is freed by the thread that
drops the ref count to 0 (we reference count cache_objects in
mod_mem_cache.c to handle race conditions between threads serving the cache
object and other threads garbage collecting the same cache object). Ditto
clearing the ptrans pool, which implies that we should create a new ref
counted object (ap_work_t ?) that contains references to ptrans, the brigade
and anything else we see fit to stick there (like a buffer to do true async
reads). A simplification is not attempt to buffer up responses to pipelined
requests.

Supporting ED/async reads is a bit more difficult.  There are two
fundamental problems:

* The functions that kickoff network reads (ap_read_request and
ap_get_mime_headers_core) are written to expect a successful read and cannot
handle being reentered multiple times on the same request.  This can be
fixed relatively easily by implementing a simple state machine used by
ap_process_http_connection() and refactoring the code in ap_read_request and
ap_get_mime_headers to segregate network reads and the code that processes
the received bytes. This work needs to be done for event driven reads and
true async reads.

* ap_get_brigade and ap_bucket_read allocates buckets under the covers (deep
in the bucket code).  This is -really- bad for async i/o because async i/o
relies on the buffers passed to read not be freed out from under the read
(this is not an issue for an event driven read I think). One solution is to
reimplement ap_get_brigade and the bucket read code to accept a buffer (and
len field) supplied by the application. This buffer would live in the
reference counted ap_work_t 'work element' mentioned earlier.

I would like to see all reads to be done ED/async, handlers to make the
decision on whether they want to do ED/async writes, and filters to handle
receiving APR_EAGAIN and APR_IO_PENDING correctly. Enabling writes to be
ED/async is a good first step.

Bill

Reply via email to