[performance/benchmark] Choosing MaxClients directive

Stas Bekman Tue, 06 Jun 2000 14:18:06 -0700

Here is another benchmark for your enjoyment. Comments are welcome!
-------------------------------------------------------------------
Choosing MaxClients directive

It's important to specify this parameter on the basis of the resources
your machine has.  The C<MaxClients> directive sets the limit on the
number of simultaneous requests that can be supported.  No more than this
number of child server processes will be created.  To configure more than
256 clients, you must edit the C<HARD_SERVER_LIMIT> entry in I<httpd.h>
and recompile. 

With a plain Apache server, it's no big deal if you run many child
processes since the processes are about 1Mb (most of it shared) and don't
eat a lot of your RAM.  The situation is different with mod_perl, where
the processes can grow to a size of 10MB and more.  Now if you have
C<MaxClients> set to 50: 50x10MB = 500MB.  Do you have 500MB of RAM
dedicated to the mod_perl server? 

With a high C<MaxClients>, if you get a high load the server will try
to serve all requests immediately.  Your CPU will have a hard time
keeping up, and if the child size multiplied by a number of running
children is larger than the total available RAM your server will start
swapping.  This will slow down everything, which in turn will make
things even slower, until eventually your machine will die.  It's
important that you take pains to ensure that swapping does not
normally happen.  Swap space is an emergency pool, not a resource to
be used routinely. If you are low on memory and you badly need it, buy
it.  Memory is cheap.

We want this directive to be as small as possible, because in this way
we can limit the resources used by the server children.  Since we can
restrict each child's process size as we will learn later, the
calculation of C<MaxClients> is pretty straightforward:

               Total RAM Dedicated to the Webserver
  MaxClients = ------------------------------------
                     MAX child's process size

So if I have 400Mb left for the mod_perl to run with, I can set
C<MaxClients> to be of 40 if I know that each child is limited to 10Mb
of memory.

You will be wondering what will happen to your server if there are
more concurrent users than C<MaxClients> at any time.  This situation
is signified by the following warning message in the C<error_log>:

  [Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
  consider raising the MaxClients setting

Technically there is no problem--any connection attempts over the
C<MaxClients> limit will normally be queued, up to a number based on
the C<ListenBacklog> directive.  When a child process is freed at the
end of a different request, the connection will be served.

But it B<is an error> because clients are being put in the queue
rather than getting served immediately, despite the fact that they do
not get an error response.  The error can be allowed to persist to
balance available system resources and response time, but sooner or
later you will need to get more RAM so you can start more child
processes.  The best approach is to try not to have this condition
reached at all, and if you reach it often you should start to worry
about it.

Fortunately the picture is different and in fact much less memory is
used in knowledgeable hands, if you recall the discussion about the
shared memory. We have developed this formula:

               Total_RAM + Shared_RAM_per_Child * MaxClients
  MaxClients = ---------------------------------------------
                        Max_Process_Size - 1

which is:

                    Total_RAM - Max_Process_Size
  MaxClients = ---------------------------------------
               Max_Process_Size - Shared_RAM_per_Child

Let's roll some calculations:

  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   4Mb

              500 - 10
 MaxClients = --------- = 81
               10 - 4

With no sharing in place

                 500
 MaxClients = --------- = 50
                 10

With sharing in place if your numbers are similar to the ones in our
example, you can have 60% more servers without buying more RAM (81
compared to 50).

If you improve sharing and the sharing level is kept across through
the child's life, let's say:

  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   8Mb

              500 - 10
 MaxClients = --------- = 245
               10 - 8

you can have 390% more servers (245 compared to 50)!

There is one more nuance to remember. The number of request per second
that you server can serve won't grow linearly with raising value of
the C<MaxClients>. Assuming that you have a lot of RAM available and
you try to set the C<MaxClients> as big as possible you will see that
starting from certain value further increasing of the C<MaxClients>
value will give you no improvement in performance.

The more clients are running, the more CPU time will be required, the
less CPU time slices each process will receive.  The response latency
(the time to respond to a request) will grow, so you won't see the
expected improvement. Let's use the C<Apache::Benchmark> module to
help us to prove that.

So this is the test handler that we have used. You can see that it
does mostly CPU intensive computations.

  httpd/perl/Benchmark/HandlerMiddle.pm
  -------------------------------------
  package Benchmark::HandlerMiddle;
  use Apache::Constants qw(:common);
  sub handler{
    $r = shift;
    $r->send_http_header('text/html');
    $r->print("Hello");
    my $x = 100;
    my $y = log ($x ** 100)  for (0..100);
    return OK;
  }
  1;

The following two files are the test specification and the extra
configuration that will be added to the I<httpd.conf> before each
server restart (the server is restarted for each subtest). Notice that
the test specification include the httpd variables as well, therefore
for each test I<httpd.conf> will be modified to include the variation
of the C<MaxClients> and constant values of C<StartServers> and
C<MaxRequestsPerChild>:

  tests/maxclients/maxclients.t
  -----------------------------
  my $uri_prefix = "http://$c{default}{hostname}:$c{default}{port}";
  my $maxclients =
    {
     name => "handler_heavy",
     desc => "This test tests how the MaxClients directive 
              influence the performance of the server",
     uri => "$uri_prefix/benchmark_handler_middle",
     concurrency  => [50],
     connections  => [1000],
     MaxClients   => [20,50,80,120],
     StartServers => 100,
     MaxRequestsPerChild => 0,  
    };
  @entries = ($maxclients);
  1;

  tests/maxclients/maxclients.conf
  --------------------------------
  PerlModule Benchmark::HandlerMiddle
  <Location /benchmark_handler_middle>
    SetHandler perl-script
    PerlHandler Benchmark::HandlerMiddle
  </Location>

And the results (the machine under test was a monster!):

  MaxClients  | avtime completed failed    rps
  --------------------------------------------
         150  |    342     50000      0    791
         200  |    339     50000      0    785
         100  |    333     50000      0    755
         250  |    402     50000      0    741
  ---------------------------------------------
  Non-varying sub-test parameters:
  ---------------------------------------------
  MaxRequestsPerChild : 0
  StartServers        : 100
  concurrency         : 300
  connections         : 50000

--------------------------------------------------------------------------

When looking at the I<Requests Per Second> (rps) column you can
clearly see that with concurrency level of 300, the performance is
almost identical for the values 150 and 200 of C<MaxClients> , but
goes down for the value of 100 (not enough processes) and we get even
worse results for the value of 250.  Note that we have kept the server
fully loaded, since the number of concurrent requests was always
higher than the number of available processes, which means that some
requests were queued and were not responded immediately.  When the
number of processes went above 200, the processes were spending more
time in the sleep state and context switching instead of doing the
real processing. On the other hand with only 100 available processes
the CPU was fully loaded while we had plenty of memory available. You
can see that in our case number 150 was the optimal one.

This leads us to interesting discovery, which we can formulate in the
following way: Extending your RAM might not improve the performance if
your CPU is already fully loaded with the current number of processes,
and if you start more of them, you will get a degradation in
performance. If on the other hand if you decide to upgrade your
machine with a a very strong CPU but you have not enough memory to
deploy CPU full time, you've just wasted money.  You had to use this
money to upgrade to less strong and less expensive CPU, and use the
remainder of the budget to buy more RAM.

To discover this capability of your server you just have to run the
benchmarks just like we did, by playing with configuration parameters
and different loads you will be able to find the underloaded or
overloaded component, in our example the two were the CPU or the RAM.

You can tune your machine using the reports like in our example, by
analyzing either the I<Request Per Second> (I<rps>) column which shows
the throughput of your server, or the I<Average processing
time> (I<avtime>) column which shows the latency of your server.
Take more samples to build a nicer linear graphs, and pick the value
of C<MaxClients> where the curve is bending after reaching the maximum
value for a throughput graph or reaching the minimum value for a
latency graph.




_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org     http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
[performance/benchmark] Choosing MaxClients directive

Reply via email to