ANN: acora 1.1 - 'fgrep' for Python

Stefan Behnel Sat, 30 Jan 2010 05:15:43 -0800

Hi,

I'm happy to announce the "almost first" release of the acora package.


It can be downloaded from PyPI.

http://pypi.python.org/pypi/acora/1.1


What is Acora?
---------------

Acora is 'fgrep' for Python, a fast multi-keyword text search engine.

Based on a set of keywords, it generates a search automaton (DFA) and
runs it over string input, either unicode or bytes.

It is based on the Aho-Corasick algorithm and an NFA-to-DFA powerset
construction.

Acora comes with both a pure Python implementation and a fast binary module
written in Cython.


Features
---------

* works with unicode strings and byte strings
* about 2-3x as fast as Python's regular expression engine
* finds overlapping matches, i.e. all matches of all keywords
* support for case insensitive search (~10x as fast as 're')
* frees the GIL while searching
* additional (slow but short) pure Python implementation
* support for Python 2.5+ and 3.x
* support for searching in files
* permissive BSD license


How do I use it?
-----------------

Import the package::

    >>> from acora import AcoraBuilder

Collect some keywords::

    >>> builder = AcoraBuilder('ab', 'bc', 'de')
    >>> builder.add('a', 'b')

Generate the Acora search engine for the current keyword set::

    >>> ac = builder.build()

Search a string for all occurrences::

    >>> ac.findall('abc')
    [('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
    >>> ac.findall('abde')
    [('a', 0), ('ab', 0), ('b', 1), ('de', 2)]

Iterate over the search results as they come in::

    >>> for kw, pos in ac.finditer('abde'):
    ...     print("%2s[%d]" % (kw, pos))
     a[0]
    ab[0]
     b[1]
    de[2]
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

        Support the Python Software Foundation:
        http://www.python.org/psf/donations/

ANN: acora 1.1 - 'fgrep' for Python

Reply via email to