Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Sam Horrocks Thu, 04 Jan 2001 04:44:16 -0800
Sorry for the late reply - I've been out for the holidays.

 > By the way, how are you doing it?  Do you use a mutex routine that works
 > in LIFO fashion?

 Speedycgi uses separate backend processes that run the perl interpreters.
 The frontend processes (the httpd's that are running mod_speedycgi)
 communicate with the backends, sending over the request and getting the output.

 Speedycgi uses some shared memory (an mmap'ed file in /tmp) to keep track
 of the backends and frontends.  This shared memory contains the queue.
 When backends become free, they add themselves at the front of this queue.
 When the frontends need a backend they pull the first one from the front
 of this list.

 > 
 > >  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
 > >  interpreters, it winds up using a lot fewer interpreters to handle the
 > >  same number of requests.
 > 
 > What I was saying is that it doesn't make sense for one to need fewer
 > interpreters than the other to handle the same concurrency.  If you have
 > 10 requests at the same time, you need 10 interpreters.  There's no way
 > speedycgi can do it with fewer, unless it actually makes some of them
 > wait.  That could be happening, due to the fork-on-demand model, although
 > your warmup round (priming the pump) should take care of that.

 What you say would be true if you had 10 processors and could get
 true concurrency.  But on single-cpu systems you usually don't need
 10 unix processes to handle 10 requests concurrently, since they get
 serialized by the kernel anyways.  I'll try to show how mod_perl handles
 10 concurrent requests, and compare that to mod_speedycgi so you can
 see the difference.

 For mod_perl, let's assume we have 10 httpd's, h1 through h10,
 when the 10 concurent requests come in.  h1 has aquired the mutex,
 and h2-h10 are waiting (in order) on the mutex.  Here's how the cpu
 actually runs the processes:

    h1 accepts
    h1 releases the mutex, making h2 runnable
    h1 runs the perl code and produces the results
    h1 waits for the mutex

    h2 accepts
    h2 releases the mutex, making h3 runnable
    h2 runs the perl code and produces the results
    h2 waits for the mutex

    h3 accepts
    ...

 This is pretty straightforward.  Each of h1-h10 run the perl code
 exactly once.  They may not run exactly in this order since a process
 could get pre-empted, or blocked waiting to send data to the client,
 etc.  But regardless, each of the 10 processes will run the perl code
 exactly once.

 Here's the mod_speedycgi example - it too uses httpd's h1-h10, and they
 all take turns running the mod_speedycgi frontend code.  But the backends,
 where the perl code is, don't have to all be run fairly - they use MRU
 instead.  I'll use b1 and b2 to represent 2 speedycgi backend processes,
 already queued up in that order.

 Here's a possible speedycgi scenario:

    h1 accepts
    h1 releases the mutex, making h2 runnable
    h1 sends a request to b1, making b1 runnable

    h2 accepts
    h2 releases the mutex, making h3 runnable
    h2 sends a request to b2, making b2 runnable

    b1 runs the perl code and sends the results to h1, making h1 runnable
    b1 adds itself to the front of the queue

    h3 accepts
    h3 releases the mutex, making h4 runnable
    h3 sends a request to b1, making b1 runnable

    b2 runs the perl code and sends the results to h2, making h2 runnable
    b2 adds itself to the front of the queue

    h1 produces the results it got from b1
    h1 waits for the mutex

    h4 accepts
    h4 releases the mutex, making h5 runnable
    h4 sends a request to b2, making b2 runnable

    b1 runs the perl code and sends the results to h3, making h3 runnable
    b1 adds itself to the front of the queue

    h2 produces the results it got from b2
    h2 waits for the mutex

    h5 accepts
    h5 release the mutex, making h6 runnable
    h5 sends a request to b1, making b1 runnable

    b2 runs the perl code and sends the results to h4, making h4 runnable
    b2 adds itself to the front of the queue

 This may be hard to follow, but hopefully you can see that the 10 httpd's
 just take turns using b1 and b2 over and over.  So, the 10 conncurrent
 requests end up being handled by just two perl backend processes.  Again,
 this is simplified.  If the perl processes get blocked, or pre-empted,
 you'll end up using more of them.  But generally, the LIFO will cause
 SpeedyCGI to sort-of settle into the smallest number of processes needed for
 the task.

 The difference between the two approaches is that the mod_perl
 implementation forces unix to use 10 separate perl processes, while the
 mod_speedycgi implementation sort-of decides on the fly how many
 different processes are needed.

 > >  Please let me know what you think I should change.  So far my
 > >  benchmarks only show one trend, but if you can tell me specifically
 > >  what I'm doing wrong (and it's something reasonable), I'll try it.
 > 
 > Try setting MinSpareServers as low as possible and setting MaxClients to a
 > value that will prevent swapping.  Then set ab for a concurrency equal to
 > your MaxClients setting.

 I previously had set MinSpareServers to 1 - it did help mod_perl get
 to a higher level, but didn't change the overall trend.

 I found that setting MaxClients to 100 stopped the paging.  At concurrency
 level 100, both mod_perl and mod_speedycgi showed similar rates with ab.
 Even at higher levels (300), they were comparable.

 But, to show that the underlying problem is still there, I then changed
 the hello_world script and doubled the amount of un-shared memory.
 And of course the problem then came back for mod_perl, although speedycgi
 continued to work fine.  I think this shows that mod_perl is still
 using quite a bit more memory than speedycgi to provide the same service.

 > >  I believe that with speedycgi you don't have to lower the MaxClients
 > >  setting, because it's able to handle a larger number of clients, at
 > >  least in this test.
 > 
 > Maybe what you're seeing is an ability to handle a larger number of
 > requests (as opposed to clients) because of the performance benefit I
 > mentioned above.
 
 I don't follow.
 
 > I don't know how hard ab tries to make sure you really
 > have n simultaneous clients at any given time.

 I do know that the ab "-c" option does seem to have an effect on the
 tests I've been running.

 > >  In other words, if with mod_perl you had to turn
 > >  away requests, but with mod_speedycgi you did not, that would just
 > >  prove that speedycgi is more scalable.
 > 
 > Are the speedycgi+Apache processes smaller than the mod_perl
 > processes?  If not, the maximum number of concurrent requests you can
 > handle on a given box is going to be the same.

 The size of the httpds running mod_speedycgi, plus the size of speedycgi
 perl processes is significantly smaller than the total size of the httpd's
 running mod_perl.

 The reason for this is that only a handful of perl processes are required by
 speedycgi to handle the same load, whereas mod_perl uses a perl interpreter
 in all of the httpds.
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Reply via email to