On Sun, Feb 12, 2012 at 04:31:49AM +0100, Xavier Noria wrote:
> On Sun, Feb 12, 2012 at 4:12 AM, Xavier Noria <f...@hashref.com> wrote:
> 
> Nowadays long strings get a performance boost. That does not make sense
> > statistically speaking, English words should be the fast ones.
> >
> 
> Indeed, running the benchmark against /usr/share/dict/words gives an
> overall speedup of more than 7x:
> 
>     https://gist.github.com/1806049
> 
> The bigger the sample the greater the speedup because inflections are the
> exception. The majority of words are inflected using the last rule, so the
> difference in the technique to loop is bigger.
> 
> This should happen also in real life, the exceptions are rare, in general
> most words will be applying the last rule.

Interesting.  Have you investigated expanding the regular expressions
and doing hash based replacement via gsub!?  Since we can know the
replacements in advance, it's possible to compile a hash and use it for
the replacement.  If the hash misses, we can fall back to a linear scan.

Here's a quick implementation as an example.  We could probably optimize
more of the expressions:

  https://gist.github.com/1806575

-- 
Aaron Patterson
http://tenderlovemaking.com/

Attachment: pgpYwdeFJCXK9.pgp
Description: PGP signature

Reply via email to