Re: Named regexp variables, an extension proposal.

Paul McGuire Sun, 14 May 2006 12:15:47 -0700

"Paddy" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> I have another use case.
> If you want to match a comma separated list of words you end up writing
> what constitutes a word twice, i.e:
>   r"\w+[,\w+]"
> As what constitues a word gets longer, you have to repeat a longer RE
> fragment so the fact that it is a match of a comma separated list is
> lost, e.g:
>   r"[a-zA-Z_]\w+[,[a-zA-Z_]\w+]"
>
> - Paddy.
>
Write a short function to return a comma separated list RE.  This has the
added advantage of DRY, too.  Adding an optional delim argument allows you
to generalize to lists delimited by dots, dashes, etc.


(Note - your posted re requires 2-letter words - I think you meant
"[A-Za-z_]\w*", not "[A-Za-z_]\w+".)
-- Paul


import re

def commaSeparatedList(regex, delim=","):
    return "%s[%s%s]*" % (regex, delim, regex)

listOfWords = re.compile( commaSeparatedList(r"\w+") )
listOfIdents = re.compile( commaSeparatedList(r"[A-Za-z_]\w*") )

# might be more robust - people put whitespace in the darndest places!
def whitespaceTolerantCommaSeparatedList(regex, delim=","):
    return r"%s[\s*%s\s*%s]*" % (regex, delim, regex)


# (BTW, delimitedList in pyparsing does this too - the default delimiter is
a comma, but other expressions can be used too)
from pyparsing import Word, delimitedList, alphas, alphanums

listOfWords = delimitedList( Word(alphas) )
listOfIdents = delimitedList( Word(alphas+"_", alphanums+"_") )


-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Named regexp variables, an extension proposal.

Reply via email to