Regular expressions for that sort of thing can get *really* big.  The
most efficient way would be to programmatically compose the regular
expression to be as exact as possible.

import re

def permutation(lst):
        """"
        From http://labix.org/snippets/permutations/. Computes permutations
of a
        list iteratively.
        """
        queue = [-1]
        lenlst = len(lst)
        while queue:
                i = queue[-1]+1
                if i == lenlst:
                        queue.pop()
                elif i not in queue:
                        queue[-1] = i
                        if len(queue) == lenlst:
                                yield [lst[j] for j in queue]
                        queue.append(-1)
                else:
                        queue[-1] = i

def segment_re(a, b):
        """
        Creates grouped regular expression pattern to match text between all
        possibilies of three-letter sets a and b.
        """
        def pattern(n):
                return "(%s)" % '|'.join( [''.join(grp) for grp in 
permutation(n)] )

        return re.compile( r'%s(\w+?)%s' % (pattern(a), pattern(b)) )

print segment_re(["a", "b", "c"], ["d", "e", "f"])

You could extend segment_re to accept an integer to limit the (\w+?)
to a definite quantifier.  This will grow the compiled expression in
memory but make matching faster (such as \w{3,n} to match from 3 to n
characters).

See http://artfulcode.net/articles/optimizing-regular-expressions/ for
specifics on optimizing regexes.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to