I know, replying to myself.

Parsing the KJV Bible took about 7 seconds with this:

#!/usr/bin/perl -w

use strict;

my $text = do {
  open my $T, '<./kjv10.txt' or die "Couldn't open kjv10.txt: $!\n";
  local $/;
  <$T>;
};

my %unique;

$text =~ s{(
             (\b\w+(?:['-]+\w+)*\b)
             (??{!$unique{$^N}++?"(?=)":"(?!)"})
           )
          }{
           $1
          }xg;

print "$_ => $unique{$_}\n" for sort keys %unique;

(It was pointed out that that's not a completely fair timing ... we
had to load the 4 M file into memory.)
--
Alan

Reply via email to