Doug Cutting wrote:
Andrzej Bialecki wrote:

100k regexps is still alot, so I'm not totally sure it would be much faster, but perhaps worth checking.


I have worked with this type of technology before (minimized, determinized FSAs, constructed from large sets of strings & expressions) and it should be very fast to perform lookups, even in large, complex FSAs. Construction of the FSA can be time consuming and should probably be done offline, not at fetcher startup time, so that it is only performed once for a number of fetcher runs.

Guess what... this library supports (de)serialization of automata, so they can be compiled once, and then just stored/loaded.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to