Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>  that's not a very efficient way to match multiple patterns, though.  a
>  much better way is to combine the patterns into a single one, and use
>  the "lastindex" attribute to figure out which one that matched.

lastindex is useful, yes.

> see
> 
>      http://effbot.org/zone/xml-scanner.htm
> 
>  for more on this topic.

I take your point. However I don't find the below very readable -
making 5 small regexps into 1 big one, plus a game of count the
brackets doesn't strike me as a huge win...

xml = re.compile(r"""
    <([/?!]?\w+)     # 1. tags
    |&(\#?\w+);      # 2. entities
    |([^<>&'\"=\s]+) # 3. text strings (no special characters)
    |(\s+)           # 4. whitespace
    |(.)             # 5. special characters
    """, re.VERBOSE)

Its probably faster though, so I give in gracelessly ;-)

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to