The patch in question was nothing more than an attempt to further improve
worker, and was perhaps a little premature. I have, however, spent some
time developing two possible alternative implementations of the worker
MPM in addition to what is currently in CVS (which means that I don't
expect what's in there to stay for much longer). I am hoping to gain
access to some SMP boxes to test out my theories, but until then I will
describe the two scenarios here (and perhaps also prepare patches for
everyone to run on their favorite SMP box at their leisure):
1) "short and sweet"
- single listener
- creates a new transaction pool
- uses that pool for the next accept()
- push()es the newly accepted socket and pool on the fd_queue
- multiple workers
- waiting in pop() on the fd_queue
- performs process_socket() with the socket and pool received from pop()
- destroys the pool
Notes: this is almost identical to what was in CVS before the patch
discussed below. The only change I would make would be to
remove one of the two condition variables.
2) "time-space tradeoff"
- single listener
- pop()s the next available worker-thread
- uses the pool from this worker-thread to make the call to accept()
- signals a condition in that worker-thread after the accept()
- multiple workers
- creates a transaction pool
- push()es thread info (pool, socket-pointer, etc)
- waits on signal from listener
- (check that listener woke us up)
- perform process_socket() with the socket set from the listener
- clear the pool
Notes: This adds some complication to the fd_queue code, but it removes
most of the complication that was a problem in the below patch.
My implementation was able to remove much of the arithmetic from
the critical sections of the fd_queue, possibly decreasing contention
in this part of the code (especially good for scalability). From
what we have in CVS, it brings us from 4 shared condition variables
down to 1, and adds a new CV to each thread (sort of like a
time-space tradeoff). Your note about LIFO vs. FIFO is noted, and
I'll implement a minor modification to test out this theory.
I had hoped that I would be able to test out these two implementations on
some big hardware before posting such lengthy design descriptions to the
list, but if it would interest others I would be willing to prepare a patch
illustrating the two above designs. I welcome any critical disection of
my above designs.
-aaron
On Mon, Sep 10, 2001 at 01:30:03PM -0700, dean gaudet wrote:
> this is the wrong way to fix this problem.
>
> i can't imagine any reason why creating a pool should be slow --
> rather than band-aid around it, i think it'd be better to find out that
> problem first. it should be as simple as a couple pointer operations.
>
> freelists are a feature of modern memory allocators -- including per-cpu
> optimisations. the right fix is probably to start relying on libc more.
> (see apr-dev threads a few months ago where i was advocating getting rid
> of apr freelists entirely.)
>
> fwiw -- freelist implementations are almost always better off LIFO rather
> than FIFO. basically the most recently freed object is more likely to be
> mapped in the TLB and have valid cache lines. older objects will incur
> TLB and cache misses.
>
> you can take these comments as a veto, but i know the patch has been
> committed already.
>
> -dean
>
> On Mon, 27 Aug 2001, Aaron Bannert wrote:
>
> > This patch implements a resource pool of context pools -- a queue of
> > available pools that the listener thread can pull from when accepting
> > a request. The worker thread that picks up that request then uses
> > that pool for the lifetime of that transaction, clear()ing the pool
> > and releasing it back to what I'm calling the "pool_queue" (har har).
> > This replaces the prior implementation that would create and destroy
> > a transaction pool for each and every request.
> >
> > I'm seeing a small performance improvement with this patch, but I suspect
> > the fd_queue code could be improved for better parallelism. I also
> > suspect that with better testing this algorithm may prove more scalable.
> >
> > -aaron