Re: Apache::Leak

Stas Bekman Sun, 19 May 2002 22:36:07 -0700

Gregory Matthews wrote:
> Hello again.
> 
> Is Apache::Leak the easiest/best module to use for both detecting AND 
> allowing us to find the source of a memory leak in mod_perl?
> 
> If so, I am not finding any good documentation on its use.  I am not a 
> mod_perl guru and what I've read so far sounds rather involved.
> 
> Can someone point me to a location where good, laymen documentation 
> exists for this module. I would love to use it to ensure my code is 
> solid (I am writing a mod_perl app from scratch and do not want to stray 
> off the wrong coding path).


There is not much documentation online regarding this. But as Perrin has 
replied in the other thread, you should worry much about leaks, when you 
don't mess with circular references, autovivified variables and reset 
your globals.

Here is the relevant section from our book, which should be published 
really soon now. It looks like the book is going to be called "Practical 
mod_perl". If you notice some wrong/unclear details or missing things, 
please let me know while we still can correct things.

=head2 Memory Leakage

It's normal for a process to grow when it processes its first few
requests.  They may be different requests, or the same requests
processing different data.  You may try to reload the same request a
few times and in many cases the process will stop growing after only
the second reload.  In any case, once a representative selection of
requests and inputs have been executed by a process, it won't usually
grow any more unless the code leaks memory.  If it grows after each
reload of an identical request, then there is probably a memory leak.

The experience might be different if the code works with some external
resource which can change between requests.  For example if the code
retrieves database records matching some query, it's possible that
from time to time the database will be updated and that a different
number of records will match the same query the next time it is
issued.  Depending on the techniques which you use to retrieve the
data, format it and send it to the user, the process may grow more or
less in size reflecting the changes in the data.

The easiest way to see whether the code is leaking is to run the
server in single process mode (C<httpd -X>), issuing the same request
a few times and see whether the process grows after each request.  If
it does, you probably have a memory leak.  If the code leaks 5kB per
request, after 1000 requests to run the leaking code there will be 5MB
of memory leaked.  If in production you have 20 processes then this
could possibly lead to 100MB of leakage after a few tens of thousands
of requests.

This technique to detect leakage can be misleading if you are not
careful.  Suppose your process first runs some clean (non-leaking)
code which acquires 100kB of memory.  In an attempt to make itself
more efficient, Perl doesn't give the 100kB memory back to the
operating system.  The next time the process runs I<any> script, some
of the 100kB will be reused.  But if this time the process runs a
script that needs to acquire only 5kB, you won't see the process grow
even if the code has actually leaked these 5kB.  Now it might take 20
or more requests for the leaking script I<served by the same process>
before you would see that process start growing again.

A process may leak memory for several reasons: badly written system
C/C++ libraries used in the httpd binary and badly written Perl code
are the most common.  Perl modules may also use C libraries, and these
might leak memory as well.  Some operating systems have been known to
have problems with their memory management functions.

If you know that you have no leaks in your code, for detecting leaks
in C/C++ libraries you should either use the technique of sampling the
memory usage described above, or use C/C++ developer tools designed
for this purpose.  This topic is beyond the scope of this book.

The C<Apache::Leak> module (derived from C<Devel::Leak>) might help
you to detect leaks in your code.  For example:

   file:leaktest.pl
   ----------------
   use Apache::Leak;

   my $global = "FooA";

   leak_test {
       $$global = 1;
       ++$global;
   };

The argument to C<leak_test()> is an anonymous sub or a block, so you
can just throw in any code you suspect might be leaking.  Beware, it
will run the code twice!  The first time in, new C<SV>s are created,
but this does not mean the code is leaking.  The second pass will give
better evidence.  You do not need to be inside mod_perl to use it.
 From the command line, the above script outputs:

   ENTER: 1482 SVs
   new c28b8 : new c2918 :
   LEAVE: 1484 SVs
   ENTER: 1484 SVs
   new db690 : new db6a8 :
   LEAVE: 1486 SVs
   !!! 2 SVs leaked !!!

This module uses the simple approach of walking the Perl internal
table of allocated I<Scalar Values> (SVs).  It records them before
entering the scope of the code under test and after leaving the scope.
At the end a comparison of the two sets is performed, sv_dump() is
called for any I<things> which did not exist in the first set and the
difference in counts is reported.  Notice that you will only see the
dumps of SVs if Perl was built with C<-DDEBUGGING> option.  In our
example it will dump two SVs twice, since the same code is run twice.
The volume of output is too great to be presented here.

Our example leaks because C<$$global = 1;> creates a new global
variable C<FooA> (with the value of C<1>) which will not be destroyed
until this module is destroyed.  Under mod_perl the module doesn't get
destroyed until the process quits.  When the code is run the second
time, C<$global> will contain I<FooB> because of the increment code at
the end of the first run.  Consider:

   $foo = "AAA";
   print "$foo\n";
   $foo++;
   print "$foo\n";

which prints:

   AAA
   AAB

So every time the code is be executed a new variable (I<FooC>, I<FooD>
etc.) will spring into existence.

C<Apache::Leak> is not very user-friendly. You may want to take a look
at C<B::LexInfo>.  It is possible to see something that might appear
to be a leak, but is actually just a Perl optimization. e.g. consider
this code:

   sub test { my ($string) = @_;}
   test("a string");

C<B::LexInfo> will show you that Perl does not release the value from
$string, unless you undef() it.  This is because Perl anticipates the
memory will be needed for another string, the next time the subroutine
is entered.  You'll see similar behavior for C<@array> length,
C<%hash> keys, and scratch areas of the pad-list for operations such
as C<join()>, `C<.>', etc.

Let's look at how C<B::LexInfo> works:

   file:leaktest1.pl
   ----------------
   package LeakTest1;
   use B::LexInfo ();

   sub test { my ($string) = @_;}

   my $lexi = B::LexInfo->new;
   my $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
   print $$diff;

This code creates a new C<B::LexInfo> object, and then runs
cvrundiff() which creates two snapshots of the lexical variables'
padlists--one before LeakTest1::test() is called and the other in this
case after it has been called with an argument I<"a string">.  Then it
calls C<diff -u> to generate the difference between the snapshots.  In
case you aren't familiar with how C<diff> works: C<-> at the beginning of
the line means that that line was removed, C<+> means that a line was
added, other lines are there to show the context in which the
difference was found.  Here is the output:

   --- /tmp/B_LexInfo_3099.before       Tue Feb 13 20:09:52 2001
   +++ /tmp/B_LexInfo_3099.after        Tue Feb 13 20:09:52 2001
   @@ -2,9 +2,11 @@
      {
        'LeakTest1::test' => {
          '$string' => {
   -        'TYPE' => 'NULL',
   +        'TYPE' => 'PV',
   +        'LEN' => 9,
            'ADDRESS' => '0x8146d80',
   -        'NULL' => '0x8146d80'
   +        'PV' => 'a string',
   +        'CUR' => 8
          },
          '__SPECIAL__1' => {
            'TYPE' => 'NULL',

Perl tries to optimize the speed by keeping the memory for C<$string>
allocated, even after the variable was destroyed.

If we run the first example with C<B::LexInfo>:

   file:leaktest2.pl
   -----------------
   package LeakTest2;
   use B::LexInfo ();

   my $global = "FooA";

   sub test {
       $$global = 1;
       ++$global;
   }

   my $lexi = B::LexInfo->new;
   my $diff = $lexi->cvrundiff('LeakTest2::test');
   print $$diff;

and the result:

   --- /tmp/B_LexInfo_3103.before Tue Feb 13 20:12:04 2001
   +++ /tmp/B_LexInfo_3103.after         Tue Feb 13 20:12:04 2001
   @@ -5,7 +5,7 @@
            'TYPE' => 'PV',
            'LEN' => 5,
            'ADDRESS' => '0x80572ec',
   -        'PV' => 'FooA',
   +        'PV' => 'FooB',
            'CUR' => 4
          }
        }

We can clearly see the leakage, since the value of C<PV> entry has
changed from one string to a different one.  Compare this with the
previous example, where a variable didn't exist and sprang into
existence for optimization reasons.  If you are still confused,
probably the best approach is to run the C<diff> twice when you test your
code.

Running the cvrundiff() function twice in both our examples:

   file:leaktest3.pl
   -----------------
   package LeakTest2;
   use B::LexInfo ();

   my $global = "FooA";

   sub test {
       $$global = 1;
       ++$global;
   }

   my $lexi = B::LexInfo->new;
   my $diff = $lexi->cvrundiff('LeakTest2::test');
   $diff    = $lexi->cvrundiff('LeakTest2::test');
   print $$diff;

and the output:

   --- /tmp/B_LexInfo_3103.before Tue Feb 13 20:12:04 2001
   +++ /tmp/B_LexInfo_3103.after         Tue Feb 13 20:12:04 2001
   @@ -5,7 +5,7 @@
            'TYPE' => 'PV',
            'LEN' => 5,
            'ADDRESS' => '0x80572ec',
   -        'PV' => 'FooB',
   +        'PV' => 'FooC',
            'CUR' => 4
          }
        }

We can see the leak again, since the value of C<PV> has changed again:
from I<FooB> and I<FooC>. And if we look at the second case:

   file:leaktest4.pl
   -----------------
   package LeakTest1;
   use B::LexInfo ();

   sub test { my ($string) = @_;}

   my $lexi = B::LexInfo->new;
   my $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
      $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
   print $$diff;

no output is produced, since there is no difference between the second
and the third run.  All the data structures are allocated during the
first execution, so we are sure that no memory is leaking here.

C<Apache::Status> includes a C<StatusLexInfo> option which can show
you the internals of your code via C<B::LexInfo>. See Chapter
[XREF=debug.pod].


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Apache::Leak

Reply via email to