On Thu, Jul 09, 2009 at 07:03:20PM -0400, Bernardo Rechea wrote: > For reference, the brute force algorithm: > > #### Brute force > T: foreach my $targetWord (@targetWords) { > foreach my $searchWord (@searchWords) { > if ($targetWord =~ /^$searchWord/) { > push @foundWords, $targetWord; > next T; > } > } > } > say $_ for @foundWords; > > > $ time ./search_prefixWords.pl wn-tmp/data.srtu wn-tmp/data.srtu.5chars > > out-brute.stdout > > real 21m54.758s > user 21m44.346s > sys 0m2.772s
Okay, I figured out why this code is so slow compared to mine. You're iterating over the words and then the prefixes. Each time through the inner loop, perl has to compile the regex: n*m regex compilations. If you swap the loops, then perl compiles the regex for each prefix just once: m regex compilations. Original: real 24m26.004s user 24m13.461s sys 0m3.160s Swapped: real 3m3.885s user 3m3.342s sys 0m0.280s Note that swapping the loops means you may need to unique the list of found words, but that's very fast to do. Alternatively, you could precompile the regexes using qr//. Ronald _______________________________________________ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm