Hi, I'm happy to announce the "almost first" release of the acora package.
It can be downloaded from PyPI. http://pypi.python.org/pypi/acora/1.1 What is Acora? --------------- Acora is 'fgrep' for Python, a fast multi-keyword text search engine. Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes. It is based on the Aho-Corasick algorithm and an NFA-to-DFA powerset construction. Acora comes with both a pure Python implementation and a fast binary module written in Cython. Features --------- * works with unicode strings and byte strings * about 2-3x as fast as Python's regular expression engine * finds overlapping matches, i.e. all matches of all keywords * support for case insensitive search (~10x as fast as 're') * frees the GIL while searching * additional (slow but short) pure Python implementation * support for Python 2.5+ and 3.x * support for searching in files * permissive BSD license How do I use it? ----------------- Import the package:: >>> from acora import AcoraBuilder Collect some keywords:: >>> builder = AcoraBuilder('ab', 'bc', 'de') >>> builder.add('a', 'b') Generate the Acora search engine for the current keyword set:: >>> ac = builder.build() Search a string for all occurrences:: >>> ac.findall('abc') [('a', 0), ('ab', 0), ('b', 1), ('bc', 1)] >>> ac.findall('abde') [('a', 0), ('ab', 0), ('b', 1), ('de', 2)] Iterate over the search results as they come in:: >>> for kw, pos in ac.finditer('abde'): ... print("%2s[%d]" % (kw, pos)) a[0] ab[0] b[1] de[2] -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/