Hi!

Given a string, I want to find all ocurrences of
certain predefined words in that string. Problem is, the list of
words that should be detected can be in the order of thousands.

With the re module, this can be solved something like this:

import re

r = re.compile("word1|word2|word3|.......|wordN")
r.findall(some_string)

Unfortunately, when having more than about 10 000 words in
the regexp, I get a regular expression runtime error when
trying to execute the findall function (compile works fine, but slow).

I don't know if using the re module is the right solution here, any
suggestions on alternative solutions or data structures which could
be used to solve the problem?

André

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to