At Fri Feb 04 18:53:09 -0500 2011, Uri Guttman wrote:
> that will kill your cpu. alternations are very slow since they have to
> go back and try from the beginning of the list each time.

Since we're talking about literals, this hasn't been true since 2007,
with the release of perl 5.10.  Perl now uses a Aho-Corasick trie
algorithm internally for literal alternations, which allows for
matching without backtracking:

    $ perl -le 'use re Debug => "TRIE";print if /(or|for|food|foo|feed)/'
    Compiling REx "(or|for|food|foo|feed)"
        TRIE(NATIVE): W:5 C:16 Uq:5 Min:2 Max:4
          Char : Match Base  Ofs     o   r   f   d   e
          State|-------------------------------------------
          #   1|       @   5 + 0[    2   .   4   .   .]
          #   2|       @   5 + 1[    .   3   .   .   .]
          #   3| W   1 @   0=20
          #   4|       @   8 + 0[    5   .   .   .   9]
          #   5|       @   D + 0[    7   6   .   .   .]
          #   6| W   2 @   0=20
          #   7| W   4 @   6 + 3[    .   .   .   8   .]
          #   8| W   3 @   0=20
          #   9|       @   6 + 4[    .   .   .   .   A]
          #   A|       @   8 + 3[    .   .   .   B   .]
          #   B| W   5 @   0=20
    Stclass Failtable (12 states): 0, 0, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1
    Final program:
       1: OPEN1 (3)
       3:   TRIEC-EXACT<S:1/11 W:5 L:2/4 C:16/5>[fo] (18)
            <or>=20
            <for>=20
            <food>=20
            <foo>=20
            <feed>=20
      18: CLOSE1 (20)
      20: END (0)
    stclass AHOCORASICKC-EXACT<S:1/11 W:5 L:2/4 C:16/5>[fo] minlen 2
    Freeing REx: "(or|for|food|foo|feed)"

You can read more about it at the Aho-Corasick wikipedia article, or
http://www.regex-engineer.org/slides/img38.html  I'd be curious to see
how those trie modules on CPAN actually stack up, speed-wise, against
perl 5.10 and higher.

 - Alex
-- 
Networking -- only one letter away from not working

_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to