On Fri, Oct 31, 2008 at 2:54 PM, Mark Wagner <[EMAIL PROTECTED]> wrote:

> I've got a script I'm using to search through a list of Wikipedia
> article titles to find ones that match certain patterns.
>
> As-written, if you run it and supply '.*target.*' on standard input,
> it will process my test file in 125 seconds.  Make any of the changes
> mentioned in the comments, and the time needed will drop to 1.8
> seconds.  Why the difference?  Particularly interesting is that it
> seems to matter where the regex pattern came from: if it's from
> standard input, testing is slow; if it's assigned in the script,
> testing is fast.
>
> If it matters, I'm using Perl 5.8.8.
>
> To see the problem I'm having, download
>
> http://download.wikimedia.org/eswiki/20081018/eswiki-20081018-all-titles-in-ns0.gz
> (a 4.1-MB file), unzip it, and run the program supplying the name of
> the unzipped file.
>
> Thanks,
> Mark Wagner
>
> --------------
> binmode STDIN, ":utf8"; # Comment this out to speed things up
>
> while(<STDIN>)
> {
>        my $lines = 0;
>        my $lines2 = 0;
>        my $regex;
>        $regex = $_;
>        chomp $regex;
>
>        #$regex = '.*target.*'; # Or uncomment this to speed things up
>        open INFILE, "<", $ARGV[0];
>        binmode INFILE, ":utf8"; # Or comment this out to speed things up
>
>        while(<INFILE>)
>        {
>                my $target = $_;
>                chomp $target;
>                $target =~ s/_/ /g;
>
>                print "Match\n" if($target =~ /^$regex$/); # Or make
> this case-insensitive to speed things up, or remove the start and end
> anchors to speed things up
>
>                $lines = $lines + 1;
>                if($lines >= 10000)
>                {
>                        $lines = 0;
>                        $lines2 += 10000;
>                        print STDERR "$lines2\r";
>                }
>        }
> }
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> http://learn.perl.org/
>
>
>

print "Match\n" if($target =~ /^$regex$/); # can be replace as:
print "Match\n" if($target eq $regex);

Reply via email to