Geez, mod_perl rocks!!!! When I've run the benchmarks on the *strong*
machine the results were all different!
700+ RPS for the hello benchmark!!!
The only mistery left is why registry is so slow on the slow machine
relative to fast machine??? And yeah I've used the latest apache/mod_perl
Enough exclamation marks, see it for yourself. I'll repost the
Apache::Request vs CGI.pm on the strong machine tomorrow.
=head1 Benchmarking
Now we will run different benchmarks to learn what techniques should
be used and what not. The following SW/HW is used for the testing
purposes:
HW: Dual Pentium II (Deschutes) 400Mhz 512 KB cache 256MB
RAM (DIMM PC100)
SW: Linux (RH 6.1) Perl 5.005_03
Apache/1.3.12 mod_perl/1.22 mod_ssl/2.6.2 OpenSSL/0.9.5
The relevant Apache configuration:
MinSpareServers 10
MaxSpareServers 20
StartServers 10
MaxClients 20
MaxRequestsPerChild 10000
=head2 Apache::Registry and Perl Content Handler
=head3 The Light (Empty) Code
First lets see the overhead that C<Apache::Registry> adds. In order to
do that we will use an almost empty scripts, that only send a basic
header and one word as a content.
The I<registry.pl> script running under C<Apache::Registry>:
benchmarks/registry.pl
----------------------
use strict;
print "Content-type: text/plain\r\n\r\n";
print "Hello";
The Perl Content handler:
Benchmark/Handler.pm
--------------------
package Benchmark::Handler;
use Apache::Constants qw(:common);
sub handler{
$r = shift;
$r->send_http_header('text/html');
$r->print("Hello");
return OK;
}
1;
with settings:
PerlModule Benchmark::Handler
<Location /benchmark_handler>
SetHandler perl-script
PerlHandler Benchmark::Handler
</Location>
so we get C<Benchmark::Handler> preloaded.
We will use the C<Apache::RegistryLoader> to preload the script as
well, so the benchmark will be fair and only the processing time will
be measured. In the I<startup.pl> we add:
use Apache::RegistryLoader ();
Apache::RegistryLoader->new->handler(
"/perl/benchmarks/registry.pl",
"/home/httpd/perl/benchmarks/registry.pl");
And we if we check the I<Compiled Registry Scripts"> section with help
of L<Apache::Status|debug/Apache_Status_Embedded_Inter> (
http://localhost/perl-status?rgysubs ), where we see the listing of
the already compiled scripts :
Apache::ROOT::perl::benchmarks::registry_2epl
=head3 The Heavy Code
We we will see that the overhead is insignificant when the code itself
is significantly heavier and slower. Let's leave the above code
examples umodified but add some CPU intensive processing operation (it
can be also an IO operation or a database query.)
my $x = 100;
my $y = log ($x ** 100) for (0..10000);
=head3 Processing and Results
So now we can proceed with the benchmark. We will generate 5000
request with 10 as a concurrency level (i.e. emulating 10 concurrent
users):
% ab -n 5000 -c 10 http://localhost/perl/benchmarks/registry.pl
% ab -n 5000 -c 10 http://localhost/benchmark_handler
And the results:
=over
=item *
Light code:
Type RPS Av.CTime
------- --- -------
Registry 561 16
Handler 707 13
Heavy code:
Type RPS Av.CTime
------- --- -------
Registry 68 146
Handler 70 141
Reports:
-----------------------------------------------
RPS : Requests Per Second
Av. CTime : Average request processing time (msec) as seen by client
=head3 Conclusions
=over
=item * The Light Code
We can see that the average overhead added by C<Apache::Registry> is
about:
16 - 13 = 3 milli-seconds
per request.
The difference in speed in per cents is about 19%.
=item * The Heavy Code
If we are looking at the average processing time, we see that the time
delta between the two handlers is almost the same and have grown from
3 msec to 5 msec. Which means that the identical heavy code that has
been added was running for 130msec (146-16). It doesn't mean that the
added code itself has been running for 130msec. It means that it took
130msec for this code to be completed, in multi-process environment
where each process gets a time slice to use CPU.
If we run this extra code under plain Benchmark:
benchmark.pl
------------
use Benchmark;
timethis (1_000,
sub {
my $x = 100;
my $y = log ($x ** 100) for (0..10000);
});
% perl benchmark.pl
timethis 1000: 25 wallclock secs (24.93 usr + 0.00 sys = 24.93 CPU)
We see that it takes about 25 CPU seconds to complete.
The interesting thing is that when the server under test runs on a
slow machine the results are completely different. I'll present them
here for comparison:
Light code:
Type RPS Av.CTime
------- --- -------
Registry 61 160
Handler 196 50
Heavy code:
Type RPS Av.CTime
------- --- -------
Registry 12 822
Handler 67 149
[ReaderMETA]: Anyone knows to explain this phenomena? It's not clear
to me why adding the same CPU intensive code to the two handlers under
test on the slow machine, enlarges the delta of the average processing
time between the two handlers. I'd expect to see the same delta (of
110 msec) in this case, but that's not what's happenning. Any ideas?
=back
______________________________________________________________________
Stas Bekman | JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ | mod_perl Guide http://perl.apache.org/guide
mailto:[EMAIL PROTECTED] | http://perl.org http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------