Nobody wrote:
On Tue, 14 Jul 2009 02:06:04 -0300, Gabriel Genellina wrote:

Matt, how many words are you looking for, in how long a string ?
Were you able to time any( substr in long_string ) against re.compile
( "|".join( list_items )) ?
There is a known algorithm to solve specifically this problem (Aho-Corasick), a good implementation should perform better than R.E. (and better than the gen.expr. with the advantage of returning WHICH string matched)

Aho-Corasick has the advantage of being linear in the length of the
patterns, so the setup may be faster than re.compile(). The actual
searching won't necessarily be any faster (assuming optimal
implementations; I don't know how safe that assumption is).

Having done a fast Aho-Corasick implementation myself, I can assure you
that the actual searching can be incredibly fast.  RE conversion usually
goes to a slightly more general machine than the Aho-Corasick processing
requires.

--Scott David Daniels
scott.dani...@acm.org

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to