> no wonder it took so long. you matched the null string between each pair
> of word boundaries. you need a +, not * there.
Thanks.
> i understand the boolean thing as i said previously. i was asking why
> you used it there. i see no reason if all you are doing is word
> counting.
Yeah, that's what I said. We realized we didn't need it for the
unique words. What we were doing originally though was pulling the
unique occurrences out of a string of text:
$a = 'abcde' x 200;
What are the unique occurrences of text in that string? That's what
the regex was solving.
The original purpose of the regex is still valid, just what I did with
it is wrong.
> $unique{$1}++ while $text =~ m/([\w'-]+)/g ;
>
> use the benchmark module to compare the speeds. make sure you don't do
> destructive parsing which some of your examples seem to to.
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use Benchmark qw( cmpthese );
my $text = read_file( './kjv10.txt' );
my %unique;
sub substitution { $text =~ s{(([\w'-]+)(?{$unique{$^N}++}))}{$1}g ;
%unique = () }
sub while_loop1 { 1 while $text =~ m{(([\w'-]+)(?{!$unique{$^N}++}))}g
; %unique = () }
sub while_loop2 { $unique{$1}++ while $text =~ m/([\w'-]+)/g ; %unique = () }
cmpthese( -60, {
'substitution' => \&substitution,
'while loop 1' => \&while_loop1,
'while loop 2' => \&while_loop2,
});
s/iter substitution while loop 1 while loop 2
substitution 2.97 -- -33% -61%
while loop 1 2.00 49% -- -42%
while loop 2 1.15 159% 73% --
--
Alan