At Fri Feb 04 18:53:09 -0500 2011, Uri Guttman wrote: > that will kill your cpu. alternations are very slow since they have to > go back and try from the beginning of the list each time.
Since we're talking about literals, this hasn't been true since 2007, with the release of perl 5.10. Perl now uses a Aho-Corasick trie algorithm internally for literal alternations, which allows for matching without backtracking: $ perl -le 'use re Debug => "TRIE";print if /(or|for|food|foo|feed)/' Compiling REx "(or|for|food|foo|feed)" TRIE(NATIVE): W:5 C:16 Uq:5 Min:2 Max:4 Char : Match Base Ofs o r f d e State|------------------------------------------- # 1| @ 5 + 0[ 2 . 4 . .] # 2| @ 5 + 1[ . 3 . . .] # 3| W 1 @ 0=20 # 4| @ 8 + 0[ 5 . . . 9] # 5| @ D + 0[ 7 6 . . .] # 6| W 2 @ 0=20 # 7| W 4 @ 6 + 3[ . . . 8 .] # 8| W 3 @ 0=20 # 9| @ 6 + 4[ . . . . A] # A| @ 8 + 3[ . . . B .] # B| W 5 @ 0=20 Stclass Failtable (12 states): 0, 0, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1 Final program: 1: OPEN1 (3) 3: TRIEC-EXACT<S:1/11 W:5 L:2/4 C:16/5>[fo] (18) <or>=20 <for>=20 <food>=20 <foo>=20 <feed>=20 18: CLOSE1 (20) 20: END (0) stclass AHOCORASICKC-EXACT<S:1/11 W:5 L:2/4 C:16/5>[fo] minlen 2 Freeing REx: "(or|for|food|foo|feed)" You can read more about it at the Aho-Corasick wikipedia article, or http://www.regex-engineer.org/slides/img38.html I'd be curious to see how those trie modules on CPAN actually stack up, speed-wise, against perl 5.10 and higher. - Alex -- Networking -- only one letter away from not working _______________________________________________ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm