I know, replying to myself. Parsing the KJV Bible took about 7 seconds with this:
#!/usr/bin/perl -w
use strict;
my $text = do {
open my $T, '<./kjv10.txt' or die "Couldn't open kjv10.txt: $!\n";
local $/;
<$T>;
};
my %unique;
$text =~ s{(
(\b\w+(?:['-]+\w+)*\b)
(??{!$unique{$^N}++?"(?=)":"(?!)"})
)
}{
$1
}xg;
print "$_ => $unique{$_}\n" for sort keys %unique;
(It was pointed out that that's not a completely fair timing ... we
had to load the 4 M file into memory.)
--
Alan
