Re: Another async I/O proposal [was Re: request for comments:multiple-connections-per-thread MPM design]

Brian Pane Mon, 25 Nov 2002 08:06:49 -0800

On Mon, 2002-11-25 at 00:02, Manoj Kasichainula wrote:
> I have some suggestions for Brian's design proposal which I'm pondering
> and writing up in another message, but meanwhile, I have an alternate
> proposal that I've been rolling around inside my head for months now, so
> I figured I might as well write it up.
> 
> It involves (mostly) a single pool of threads all running through an
> event loop. I think the below could be written as a single MPM for a
> specific operating system, or a generic MPM optimized for many OSes, or
> just APR.
> 
> It is also a hybrid sync/async approach, but most aspects of the approach
> can be handled by a single thread pool instead of multiple.
> 
> Please punch holes in this proposal at will.


In general, I like this design.  It provides a simple solution
for mixing event-driven and non-event-driven modules in the same
server.  I see a few problems, though, as detailed below:

> Definitions
> -----------
> 
> Ticket - something to do, e.g. [READ, fd], [LISTEN, fd], [WRITE, fd,
> buckets]. It's a request for the main event loop to give us back an
> event.
> 
> Event - something that has been done (with some of the data used in it)
> and its result, e.g. [READ, buckets], [LISTEN, fd], [WRITE], etc.
> 
> Both of the above include contexts for state maintenance of course.
> 
> Event processor - receives events, processes them, decides on
> consequences, and returns a new ticket to handle, or NULL if there is
> none
> 
> 
> Design
> ------
> 
> We have a single pool of threads, growing and shrinking as needed, in a
> standard event-handling loop:
> 
> while (event = get_next_event())
>    add more spare threads if needed
>    event_processor = lookup_event_processor(event)
>    ticket = event_processor(event)
>    if (ticket) submit_ticket(ticket)
>    exit loop (and thus end thread) if not needed
> 
> The event_processor can take as long as it wants, since there are other
> threads who can wait for the next event.

Where is the locking done?  Is the lock just around the
get_next_event() call?

> Tickets could be handled in multiple disjoint iterations of the event
> loop, but the event processors never see this. This is how Windows can
> process a WRITE ticket for a file bucket with TransmitFile w/ completion
> ports, Linux can (IIRC) use a non-blocking sendfile loop, and an
> old-school unix can use a read-write loop. Note that I did mention
> platform-specific code; does APR know how to do async and nonblocking
> I/O for various platforms in the optimal way? If not, this loop could.

APR handles much of the work: it provides a sendfile API, for example,
that's ifdef'ed to do sendfilev on Solaris, sendfile on Linux, and
mmap+writev on older platforms.  Based on our experiences with the
core_output_filter in 2.0, though, I expect that get_next_event()
will still have to do some platform-specific processing so that it
knows when to cork/un-cork the connection on Linux, for example.

> submit_ticket and get_next_event work together to provide the smarts of
> the loop. On old-school unix, submit_ticket would take a ticket and set
> up the fd_set, and get_next_event would select() on the fd_set and do
> what's appropriate, which doesn't always involve a quick system call and
> a return of an event. For example, while handling a WRITE ticket, we
> might only be able to partially complete the write without blocking. In
> that case, get_next_event could rejigger the fd_set and go back to the
> select() call.
> 
> HTTP's event_processors, in a simple case where all handlers read HTTP
> request data, process it, then return looks sort of like:
> 
> http_listen_processor = http_request_processor
>    
> http_request_processor(event)
>     input_buckets += get_buckets(event)
>     if (need_more_for_this_request)
>         return new_read_ticket(fd, http_request_processor, context)
>     else
>         /* Next line can take a long time and can be written in a
>          * blocking fashion */
>         output_buckets = request_handler(fd, input_buckets)
>         return new_write_ticket(fd, output_buckets,
>                                 http_keepalive_processor, context)


Once the httpd_request_processor() has created a new ticket for
the write, how does the submit_ticket() get the socket added into
the pollset?  If it's possible for another request to be in the
middle of a poll call at the same time, does submit_ticket()
interrupt the poll in order to add the new descriptor?


> http_keepalive_processor(event)
>     if (keepalive)
>         return NULL
>     else
>         return new_read_ticket(fd, http_request_processor, context)
> 
> If we want to allow it, the request_handler() call above could even do
> its own reading and writing of the file descriptor.
> 
> In the single process case on old-school Unix, submit_ticket can just
> tell get_next_event to select+accept w/ a simple mutex around them.  In
> the multiple process case, it can wait on a queue for an outside
> listener thread like in Brian's description. And in some Unixes (and I
> believe Windows with completion ports), the multiprocess case isn't a
> concern. Linux 2.6 could use epoll and avoid all these issues, and 2.4
> has a realtime signal interface to do the same thing I believe.
> 
> I've glossed over where the conn_recs and request_recs get built.
> That's mainly because I don't know how the multi-protocol stuff deals
> with request_recs :). I would expect conn_recs to be completely generic,
> and request_recs to be somewhat or completely http-specific. Generic
> portions could go into the main event loop, HTTP portions go into the
> http event processors.
> 
> Disadvantages of this proposal I can think of offhand:
> 
> - Because threads are mostly in one large pool, some common structures
>   have to be protected through a mutex. I like paying for mutexes more
>   than paying for context switches though.
> 
> - We're creating a destroying a lot of "objects" (tickets and events).
>   I don't think there'll be much overhead since these aren't real OO
>   objects, but we have to be careful

One more disadvantage:

- Flow control will be difficult.  Here's a tricky scenario I
  just thought of:  The server is configured to run 10 threads.
  Most of the time, it only needs a couple of them, because it's
  serving mostly static content and and occasional PHP request.
  Suddenly, it gets a flood of requests for PHP pages.  The first
  ten of these quickly take over the ten available threads.  As
  PHP doesn't know how to work in an event-driven world, each
  of these requests holds onto its thread for a long time.  When
  one of them finally completes, it produces some content to be
  written.  But the writes may be starved, because the first
  thread that finishes its PHP request and goes back into the
  select loop might find another incoming request and read it
  before doing any writes.  And if that new request is another
  long-running PHP request, it could be a while before we finally
  get around to doing the write.

  It's possible to partly work around this by implementing
  get_next_event() so that it completes all pending, unblocked
  writes before returning.  But more generally, we'll need some
  solution to keep long-running, non-event-based requests from
  taking over all the server threads.  (This is true of my design
  as well.)


Brian

Re: Another async I/O proposal [was Re: request for comments:multiple-connections-per-thread MPM design]

Reply via email to