Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Sam Horrocks Thu, 21 Dec 2000 11:17:40 -0800
 > Gunther Birznieks wrote:
 > > Sam just posted this to the speedycgi list just now.
 > [...]
 > > >The underlying problem in mod_perl is that apache likes to spread out
 > > >web requests to as many httpd's, and therefore as many mod_perl interpreters,
 > > >as possible using an LRU selection processes for picking httpd's.
 > 
 > Hmmm... this doesn't sound right.  I've never looked at the code in
 > Apache that does this selection, but I was under the impression that the
 > choice of which process would handle each request was an OS dependent
 > thing, based on some sort of mutex.
 > 
 > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html
 > 
 > Doesn't that appear to be saying that whichever process gets into the
 > mutex first will get the new request?

 I would agree that whichver process gets into the mutex first will get
 the new request.  That's exactly the problem I'm describing.  What you
 are describing here is first-in, first-out behaviour which implies LRU
 behaviour.

 Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
 2 finishes and requests the mutex, then 3 finishes and requests the mutex.
 So when the next three requests come in, they are handled in the same order:
 1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.

 > In my experience running
 > development servers on Linux it always seemed as if the the requests
 > would continue going to the same process until a request came in when
 > that process was already busy.

 No, they don't.  They go round-robin (or LRU as I say it).

 Try this simple test script:

 use CGI;
 my $cgi = CGI->new;
 print $cgi->header();
 print "mypid=$$\n";

 WIth mod_perl you constantly get different pids.  WIth mod_speedycgi you
 usually get the same pid.  THis is a really good way to see the LRU/MRU
 difference that I'm talking about.

 Here's the problem - the mutex in apache is implemented using a lock
 on a file.  It's left up to the kernel to decide which process to give
 that lock to.

 Now, if you're writing a unix kernel and implementing this file locking code,
 what implementation would you use?  Well, this is a general purpose thing -
 you have 100 or so processes all trying to acquire this file lock.  You could
 give out the lock randomly or in some ordered fashion.  If I were writing
 the kernel I would give it out in a round-robin fashion (or the
 least-recently-used process as I referred to it before).  Why?  Because
 otherwise one of those processes may starve waiting for this lock - it may
 never get the lock unless you do it in a fair (round-robin) manner.

 THe kernel doesn't know that all these httpd's are exactly the same.
 The kernel is implementing a general-purpose file-locking scheme and
 it doesn't know whether one process is more important than another.  If
 it's not fair about giving out the lock a very important process might
 starve.

 Take a look at fs/locks.c (I'm looking at linux 2.3.46).  In there is the
 comment:

 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  */
 static void locks_insert_block(struct file_lock *blocker, struct file_lock *waiter)

 > As I understand it, the implementation of "wake-one" scheduling in the
 > 2.4 Linux kernel may affect this as well.  It may then be possible to
 > skip the mutex and use unserialized accept for single socket servers,
 > which will definitely hand process selection over to the kernel.

 If the kernel implemented the queueing for multiple accepts using a LIFO
 instead of a FIFO and apache used this method instead of file locks,
 then that would probably solve it.

 Just found this on the net on this subject:
    http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html
    http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html

 > > >The problem is that at a high concurrency level, mod_perl is using lots
 > > >and lots of different perl-interpreters to handle the requests, each
 > > >with its own un-shared memory.  It's doing this due to its LRU design.
 > > >But with SpeedyCGI's MRU design, only a few speedy_backends are being used
 > > >because as much as possible it tries to use the same interpreter over and
 > > >over and not spread out the requests to lots of different interpreters.
 > > >Mod_perl is using lots of perl-interpreters, while speedycgi is only using
 > > >a few.  mod_perl is requiring that lots of interpreters be in memory in
 > > >order to handle the requests, wherase speedy only requires a small number
 > > >of interpreters to be in memory.
 > 
 > This test - building up unshared memory in each process - is somewhat
 > suspect since in most setups I've seen, there is a very significant
 > amount of memory being shared between mod_perl processes.

 My message and testing concerns un-shared memory only.  If all of your memory
 is shared, then there shouldn't be a problem.

 But a point I'm making is that with mod_perl you have to go to great
 lengths to write your code so as to avoid unshared memory.  My claim is that
 with mod_speedycgi you don't have to concern yourself as much with this.
 You can concentrate more on the application and less on performance tuning.

 > Regardless,
 > the explanation here doesn't make sense to me.  If we assume that each
 > approach is equally fast (as Sam seems to say earlier in his message)
 > then it should take an equal number of speedycgi and mod_perl processes
 > to handle the same concurrency.

 I don't assume that each approach is equally fast under all loads.  They
 were about the same with concurrency level-1, but higher concurrency levels
 they weren't.

 I am saying that since SpeedyCGI uses MRU to allocate requests to perl
 interpreters, it winds up using a lot fewer interpreters to handle the
 same number of requests.

 On a single-CPU system of course at some point all the concurrency has
 to be serialized. mod_speedycgi and mod_perl take different approaches
 before getting to get to that point.  mod_speedycgi tries to use as
 small a number of unix processes as possible, while mod_perl tries to
 use a very large number of unix processes.

 > That leads me to believe that what's really happening here is that
 > Apache is pre-forking a bit over-zealously in response to a sudden surge
 > of traffic from ab, and thus has extra unused processes sitting around
 > waiting, while speedycgi is avoiding this situation by waiting for
 > someone to try and use the processes before forking them (i.e. no
 > pre-forking).  The speedycgi way causes a brief delay while new
 > processes fork, but doesn't waste memory.  Does this sound like a
 > plausible explanation to folks?

 I don't think it's pre-forking.  When I ran my tests I would always run
 them twice, and take the results from the second run.  The first run
 was just to "prime the pump".

 I tried reducing MinSpareSErvers, and this did help mod_perl get a higher
 concurrency number, but it would still run into a wall where speedycgi
 would not.
 
 > This is probably all a moot point on a server with a properly set
 > MaxClients and Apache::SizeLimit that will not go into swap.

 Please let me know what you think I should change.  So far my
 benchmarks only show one trend, but if you can tell me specifically
 what I'm doing wrong (and it's something reasonable), I'll try it.

 I don't think SizeLimit is the answer - my process isn't growing.  It's
 using the same 50k of un-shared memory over and over.

 I believe that with speedycgi you don't have to lower the MaxClients
 setting, because it's able to handle a larger number of clients, at
 least in this test.  In other words, if with mod_perl you had to turn
 away requests, but with mod_speedycgi you did not, that would just
 prove that speedycgi is more scalable.

 Now you could tell me "don't use unshared memory", but that's outside
 the bounds of the test.   The whole test concerns unshared memory.
 
 > I would
 > expect mod_perl to have the advantage when all processes are
 > fully-utilized because of the shared memory.

 Maybe.  There must a benchmark somewhere that would show off of
 mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
 program or something like that - it would have to be something where
 mod_perl is using *lots* of shared memory, because keep in mind that
 there are still going to be a whole lot fewer SpeedyCGI processes than
 there are mod_perl processes, so you would really have to go overboard
 in the shared-memory department.

 > It would be cool if speedycgi could somehow use a parent process
 > model and get the shared memory benefits too.

 > Speedy seems like it
 > might be more attractive to > ISPs, and it would be nice to increase
 > interoperability between the two > projects.

 Thanks.  And please, I'm not trying  start a speedy vs mod_perl war.
 My original message was only to the speedycgi list, but now that it's
 on mod_perl I think I have to reply there too.

 But, there is a need for a little good PR on speedycgi's side, and I
 was looking for that.  I would rather just see mod_perl fixed if that's
 possible.  But the last time I brought up this issue (maybe a year ago)
 I was unable to convince the people on the mod_perl list that this
 problem even existed.

 Sam
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Reply via email to