A couple of questions: 1- are you closing <IN> anywhere? 2- are you doing a foreach $keyword on every line of every file? That *could* get slow. 3- Does it start out fast and get slower as it goes? Or is it slow from the start?
> -----Original Message----- > From: Craig Cardimon [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 15, 2005 3:28 PM > To: [email protected]; ActivePerl > Subject: Keyword search is dragging > > > I'm searching large ASCII files for keywords. The keywords > are part of > section headings. These headings are in all caps on lines by > themselves. > > The files sometimes contain HTML tags. My logic handles this well > enough, but combs through the HTML very slowly. I'm dealing > with tens of > thousands of files, so speed counts. > > I thought I'd get around this by using HTML::TokeParser to remove any > HTML before I searched each file. But now the script processes EVERY > file slowly, taking a few seconds for each. > > Any suggestions on how I might optimize the following code, or what I > could be doing better? > > -- Craig > > > # slurp file into variable > { > local $/; > $wholefile = <IN>; > } > > # remove HTML tags from variable, leaving only text > my $parser = HTML::TokeParser->new (\$wholefile); > while (my $token = $parser->get_token) > { > next unless $token->[0] eq 'T'; > $wholefile2 = $wholefile2 . $token->[1]; > } > > foreach $keyword (@all_keywords) > { > my $re = qr > { > ( # start of $1 variable > ( # start of a group > (\w+[A-Z])+ # one or more words in caps > \s+ # one or more spaces > )* # zero or more groups > $keyword # the $keyword variable > \s+ # one or more spaces > AGREEMENT # the word "AGREEMENT" > ) # end of $1 variable > }x; > > my $wholeRE = qr{^\s*$re\s*$}; > > if($wholefile2 =~ /$wholeRE/gm) > { > # proceed > } > > } > > > --- > avast! Antivirus: Outbound message clean. > Virus Database (VPS): 0511-0, 03/15/2005 > Tested on: 3/15/2005 3:28:18 PM > avast! - copyright (c) 1988-2004 ALWIL Software. > http://www.avast.com > > > > _______________________________________________ > Perl-Win32-Users mailing list > [email protected] > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > > > > __________________________________________________________ > This message was scanned by ATX > 3:32:40 PM ET - 3/15/2005 > _______________________________________________ Perl-Win32-Users mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
