Sorry for the late reply - I've been out for the holidays.
> By the way, how are you doing it? Do you use a mutex routine that works
> in LIFO fashion?
Speedycgi uses separate backend processes that run the perl interpreters.
The frontend processes (the httpd's that are running mod_speedycgi)
communicate with the backends, sending over the request and getting the output.
Speedycgi uses some shared memory (an mmap'ed file in /tmp) to keep track
of the backends and frontends. This shared memory contains the queue.
When backends become free, they add themselves at the front of this queue.
When the frontends need a backend they pull the first one from the front
of this list.
>
> > I am saying that since SpeedyCGI uses MRU to allocate requests to perl
> > interpreters, it winds up using a lot fewer interpreters to handle the
> > same number of requests.
>
> What I was saying is that it doesn't make sense for one to need fewer
> interpreters than the other to handle the same concurrency. If you have
> 10 requests at the same time, you need 10 interpreters. There's no way
> speedycgi can do it with fewer, unless it actually makes some of them
> wait. That could be happening, due to the fork-on-demand model, although
> your warmup round (priming the pump) should take care of that.
What you say would be true if you had 10 processors and could get
true concurrency. But on single-cpu systems you usually don't need
10 unix processes to handle 10 requests concurrently, since they get
serialized by the kernel anyways. I'll try to show how mod_perl handles
10 concurrent requests, and compare that to mod_speedycgi so you can
see the difference.
For mod_perl, let's assume we have 10 httpd's, h1 through h10,
when the 10 concurent requests come in. h1 has aquired the mutex,
and h2-h10 are waiting (in order) on the mutex. Here's how the cpu
actually runs the processes:
h1 accepts
h1 releases the mutex, making h2 runnable
h1 runs the perl code and produces the results
h1 waits for the mutex
h2 accepts
h2 releases the mutex, making h3 runnable
h2 runs the perl code and produces the results
h2 waits for the mutex
h3 accepts
...
This is pretty straightforward. Each of h1-h10 run the perl code
exactly once. They may not run exactly in this order since a process
could get pre-empted, or blocked waiting to send data to the client,
etc. But regardless, each of the 10 processes will run the perl code
exactly once.
Here's the mod_speedycgi example - it too uses httpd's h1-h10, and they
all take turns running the mod_speedycgi frontend code. But the backends,
where the perl code is, don't have to all be run fairly - they use MRU
instead. I'll use b1 and b2 to represent 2 speedycgi backend processes,
already queued up in that order.
Here's a possible speedycgi scenario:
h1 accepts
h1 releases the mutex, making h2 runnable
h1 sends a request to b1, making b1 runnable
h2 accepts
h2 releases the mutex, making h3 runnable
h2 sends a request to b2, making b2 runnable
b1 runs the perl code and sends the results to h1, making h1 runnable
b1 adds itself to the front of the queue
h3 accepts
h3 releases the mutex, making h4 runnable
h3 sends a request to b1, making b1 runnable
b2 runs the perl code and sends the results to h2, making h2 runnable
b2 adds itself to the front of the queue
h1 produces the results it got from b1
h1 waits for the mutex
h4 accepts
h4 releases the mutex, making h5 runnable
h4 sends a request to b2, making b2 runnable
b1 runs the perl code and sends the results to h3, making h3 runnable
b1 adds itself to the front of the queue
h2 produces the results it got from b2
h2 waits for the mutex
h5 accepts
h5 release the mutex, making h6 runnable
h5 sends a request to b1, making b1 runnable
b2 runs the perl code and sends the results to h4, making h4 runnable
b2 adds itself to the front of the queue
This may be hard to follow, but hopefully you can see that the 10 httpd's
just take turns using b1 and b2 over and over. So, the 10 conncurrent
requests end up being handled by just two perl backend processes. Again,
this is simplified. If the perl processes get blocked, or pre-empted,
you'll end up using more of them. But generally, the LIFO will cause
SpeedyCGI to sort-of settle into the smallest number of processes needed for
the task.
The difference between the two approaches is that the mod_perl
implementation forces unix to use 10 separate perl processes, while the
mod_speedycgi implementation sort-of decides on the fly how many
different processes are needed.
> > Please let me know what you think I should change. So far my
> > benchmarks only show one trend, but if you can tell me specifically
> > what I'm doing wrong (and it's something reasonable), I'll try it.
>
> Try setting MinSpareServers as low as possible and setting MaxClients to a
> value that will prevent swapping. Then set ab for a concurrency equal to
> your MaxClients setting.
I previously had set MinSpareServers to 1 - it did help mod_perl get
to a higher level, but didn't change the overall trend.
I found that setting MaxClients to 100 stopped the paging. At concurrency
level 100, both mod_perl and mod_speedycgi showed similar rates with ab.
Even at higher levels (300), they were comparable.
But, to show that the underlying problem is still there, I then changed
the hello_world script and doubled the amount of un-shared memory.
And of course the problem then came back for mod_perl, although speedycgi
continued to work fine. I think this shows that mod_perl is still
using quite a bit more memory than speedycgi to provide the same service.
> > I believe that with speedycgi you don't have to lower the MaxClients
> > setting, because it's able to handle a larger number of clients, at
> > least in this test.
>
> Maybe what you're seeing is an ability to handle a larger number of
> requests (as opposed to clients) because of the performance benefit I
> mentioned above.
I don't follow.
> I don't know how hard ab tries to make sure you really
> have n simultaneous clients at any given time.
I do know that the ab "-c" option does seem to have an effect on the
tests I've been running.
> > In other words, if with mod_perl you had to turn
> > away requests, but with mod_speedycgi you did not, that would just
> > prove that speedycgi is more scalable.
>
> Are the speedycgi+Apache processes smaller than the mod_perl
> processes? If not, the maximum number of concurrent requests you can
> handle on a given box is going to be the same.
The size of the httpds running mod_speedycgi, plus the size of speedycgi
perl processes is significantly smaller than the total size of the httpd's
running mod_perl.
The reason for this is that only a handful of perl processes are required by
speedycgi to handle the same load, whereas mod_perl uses a perl interpreter
in all of the httpds.