I recently had a similar problem. A regex that worked fine in sample code
was a dog in the web-server code. It only happened with really long strings.
I tracked down the problem to this from the 'perlre' manpage.

       WARNING: Once Perl sees that you need one of "$&", "$`", or "$'"
anywhere in the program, it
       has to provide them for every pattern match.  This may substantially
slow your program.  Perl
       uses the same mechanism to produce $1, $2, etc, so you also pay a
price for each pattern that
       contains capturing parentheses.  (To avoid this cost while retaining
the grouping behaviour,
       use the extended regular expression "(?: ... )" instead.)  But if you
never use "$&", "$`" or
       "$'", then patterns without capturing parentheses will not be
penalized.  So avoid "$&",
       "$'", and "$`" if you can, but if you can't (and some algorithms
really appreciate them),
       once you've used them once, use them at will, because you've already
paid the price.  As of
       5.005, "$&" is not so costly as the other two.

Basically one of the modules in the web-app I was 'use'ing needed $', but my
test code didn't 'use' that module. The result was pretty dramatic in this
case, something that took approx 1 second in the test code was timing out
after 2 minutes in the web-server.

What I did in the end was something like this:

In the code somewhere add this so it's run when a request hits.

open(F, '>/tmp/modulelist');
print F join("\n", values %INC), "\n";
close(F);

This creates a file which lists all the loaded modules. Then after sticking
a request through the browser, do something like:

grep \$\' `cat /tmp/modulelist`
grep \$\& `cat /tmp/modulelist`
grep \$\` `cat /tmp/modulelist`

to try and track down the offending module. You'll get quite a few false
hits (comments, etc), but you might find an offending module. The main ones
I found were:

Parse::RecDescent
Net::DNS

and a couple of others I can't remember now. I fixed Net::DNS myself and
sent a patch to the maintainer, but haven't heard anything. If you find this
happens to be your problem as well, ask me for the patched version.
Parse::RecDescent makes heavy use of the above vars, no chance of fixing
that in a hurry.

Rob

----- Original Message -----
From: "Paul Mineiro" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 24, 2002 11:01 AM
Subject: Re: slow regex [BENCHMARK]


> Paul Mineiro wrote:
>
> i've cleaned up the example to tighten the case:
>
> the mod perl code  snippet is:
>
> ---
>
>   my @cg;
>
>   open DIL, '>', "/tmp/seqdata";
>   print DIL $seq;
>   close DIL;
>
>   warn "length seq = @{[length ($seq)]}";
>
>   my $t = timeit (1, sub {
>                         while ($seq =~ /CG/g)
>                           {
>                             push @cg, pos ($seq);
>                           }
>                      });
>
>   print STDERR timestr ($t), "\n";
>
> ---
>
> which yields
> length seq = 200001 at
> /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm line 634,
> <GEN1> line 102
> 16 wallclock secs (15.56 usr +  0.01 sys = 15.57 CPU) @  0.06/s (n=1)
>
> and the perl script (command line) version is:
>
> ---
>
> #!/usr/bin/perl
>
> use Benchmark;
> use strict;
>
> open DIL, '<', "/tmp/seqdata";
> my $seq = <DIL>;
> close DIL;
>
> warn "length seq is @{[length $seq]}";
>
> my @cg;
>
> my $t = timeit (1, sub {
>                       while ($seq =~ /CG/g)
>                         {
>                           push @cg, pos ($seq);
>                         }
>                    });
>
> print STDERR timestr ($t), "\n";
>
> ---
> which yields:
>
> length seq is 200001 at ./t.pl line 10.
>  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
>
> the data is pretty big, so i didn't attach it, but feel free to contact
> me directly for it.
>
> -- p
>
> >hi.  i'm running mod_perl 1.26 + apache 1.3.14 + perl 5.6.1
> >
> >i have a loop in a mod_perl handler like so:
> >----
> >  my $stime = time ();
> >
> >  while ($seq =~ /CG/og)
> >    {
> >      push @cg,  pos ($seq);
> >    }
> >
> >  my $etime = time ();
> >
> >  warn "time was: ", scalar localtime ($stime), " ",
> >        scalar localtime ($etime), " ", $etime - $stime;
> >----
> >
> >under mod_perl this takes 23 seconds.  running the perl "by hand" (via
> >extracting this piece into a seperate perl script) on the same data takes
> >less than 1 second.
> >
> >has anyone seen this kind of extreme slowdown before?
> >
> >-- p
> >
> >info:
> >
> >apache build options:
> >
> >CFLAGS="-g -g -O3 -funroll-loops" \
> >LDFLAGS="-L/home/aerives/lib -L/home/aerives/lib/mysql" \
> >LIBS="-L/home/aerives/genegrokker-interface/lib
> >-L/home/aerives/genegrokker-interface/ext/lib -L/home/aerives/lib
> >-L/home/aerives/lib/mysql" \
> >./configure \
> >"--prefix=/home/aerives/genegrokker-interface/ext" \
> >"--enable-rule=EAPI" \
> >"--enable-module=most" \
> >"--enable-shared=max" \
> >"--with-layout=GNU" \
> >"--disable-rule=EXPAT" \
> >"$@"
> >
> >mod_perl build options:
> >
> >configure_options="PERL_USELARGEFILES=0 USE_APXS=1
> >WITH_APXS=$PLAYPEN_ROOT/ext/sbin/apxs EVERYTHING=1
> >INC=$PLAYPEN_ROOT/ext/include -DEAPI"
> >
> >perl -V:
> >Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
> >  Platform:
> >    osname=linux, osvers=2.4.13, archname=i386-linux
> >    uname='linux duende 2.4.13 #1 wed oct 31 19:18:07 est 2001 i686
unknown '
> >
   config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux
>
>-Dprefix=/usr -Dprivlib=/usr/share/perl/5.6.1 -Darchlib=/usr/lib/perl/5.6.1
>
>-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl
5
> >-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.6.1
> >-Dsitearch=/usr/local/lib/perl/5.6.1 -Dman1dir=/usr/share/man/man1
> >-Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl
> >-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Duseshrplib
> >-Dlibperl=libperl.so.5.6.1 -Dd_dosuid -des'
> >    hint=recommended, useposix=true, d_sigaction=define
> >    usethreads=undef use5005threads=undef useithreads=undef
> >usemultiplicity=undef
> >    useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
> >    use64bitint=undef use64bitall=undef uselongdouble=undef
> >  Compiler:
> >    cc='cc', ccflags ='-DDEBIAN -fno-strict-aliasing -I/usr/local/include
> >-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
> >    optimize='-O2',
> >    cppflags='-DDEBIAN -fno-strict-aliasing -I/usr/local/include'
> >    ccversion='', gccversion='2.95.4  (Debian prerelease)',
gccosandvers=''
> >    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> >    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> >    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> >lseeksize=8
> >    alignbytes=4, usemymalloc=n, prototype=define
> >  Linker and Libraries:
> >    ld='cc', ldflags =' -L/usr/local/lib'
> >    libpth=/usr/local/lib /lib /usr/lib
> >    libs=-lgdbm -ldb -ldl -lm -lc -lcrypt
> >    perllibs=-ldl -lm -lc -lcrypt
> >    libc=/lib/libc-2.2.4.so, so=so, useshrplib=true,
libperl=libperl.so.5.6.1
> >  Dynamic Linking:
> >    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
> >    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
> >
> >
> >Characteristics of this binary (from libperl):
> >  Compile-time options: USE_LARGE_FILES
> >  Built under linux
> >  Compiled at Jan 11 2002 04:09:18
> >  %ENV:
> >
>
>PERL5LIB="/home/aerives/genegrokker-interface/lib/perl5:/home/aerives/geneg
rokker-interface/ext/lib/perl5:/home/aerives/lib/perl5"
> >  @INC:
> >    /home/aerives/genegrokker-interface/lib/perl5
> >    /home/aerives/genegrokker-interface/ext/lib/perl5
> >    /home/aerives/lib/perl5
> >    /usr/local/lib/perl/5.6.1
> >    /usr/local/share/perl/5.6.1
> >    /usr/lib/perl5
> >    /usr/share/perl5
> >    /usr/lib/perl/5.6.1
> >    /usr/share/perl/5.6.1
> >    /usr/local/lib/site_perl
> >
>
>
>
>

Reply via email to