Re: [Boston.pm] binary search on a list of sorted strings in memory

Ronald J Kimball Fri, 10 Jul 2009 19:41:56 -0700

On Thu, Jul 09, 2009 at 07:03:20PM -0400, Bernardo Rechea wrote:

> For reference, the brute force algorithm:
> 
> #### Brute force
> T: foreach my $targetWord (@targetWords) {
>     foreach my $searchWord (@searchWords) {
>         if ($targetWord =~ /^$searchWord/) {
>             push @foundWords, $targetWord;
>             next T;
>         }
>     }
> }
> say $_ for @foundWords;
> 
> 
> $ time ./search_prefixWords.pl wn-tmp/data.srtu wn-tmp/data.srtu.5chars  > 
> out-brute.stdout
> 
> real    21m54.758s
> user    21m44.346s
> sys     0m2.772s


Okay, I figured out why this code is so slow compared to mine.  You're
iterating over the words and then the prefixes.  Each time through the
inner loop, perl has to compile the regex: n*m regex compilations.  If you
swap the loops, then perl compiles the regex for each prefix just once: m
regex compilations.

Original:

real   24m26.004s
user   24m13.461s
sys    0m3.160s

Swapped:

real   3m3.885s
user   3m3.342s
sys    0m0.280s

Note that swapping the loops means you may need to unique the list of found
words, but that's very fast to do.  Alternatively, you could precompile the
regexes using qr//.

Ronald

_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] binary search on a list of sorted strings in memory

Reply via email to