New submission from wesley chun <wes...@gmail.com>: In the re docs, it states the following for the conditional regular expression syntax:
(?(id/name)yes-pattern|no-pattern) Will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. no-pattern is optional and can be omitted. For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching pattern, which will match with '<u...@host.com>' as well as 'u...@host.com', but not with '<u...@host.com'. this regex is incomplete as it allows for 'u...@host.com>': >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<u...@host.com>')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'u...@host.com')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<u...@host.com')) False >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'u...@host.com>')) True This error has existed since this feature was added in 2.4... http://docs.python.org/release/2.4.4/lib/re-syntax.html ... through the 3.3. docs... http://docs.python.org/dev/py3k/library/re.html#regular-expression-syntax The fix is to add the end char '$' to the regex to get all 4 working: >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<u...@host.com>')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'u...@host.com')) True >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<u...@host.com')) False >>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'u...@host.com>')) False If accepted, I propose this patch (also attached): $ svn diff re.rst Index: re.rst =================================================================== --- re.rst (revision 88499) +++ re.rst (working copy) @@ -297,9 +297,9 @@ ``(?(id/name)yes-pattern|no-pattern)`` Will try to match with ``yes-pattern`` if the group with given *id* or *name* exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional and - can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor email + can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email matching pattern, which will match with ``'<u...@host.com>'`` as well as - ``'u...@host.com'``, but not with ``'<u...@host.com'``. + ``'u...@host.com'``, but not with ``'<u...@host.com'`` nor ``'u...@host.com>'`` . ---------- assignee: docs@python components: Documentation, Regular Expressions files: re.rst messages: 129041 nosy: docs@python, wesley.chun priority: normal severity: normal status: open title: incorrect pattern in the re module docs for conditional regex versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file20833/re.rst _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11283> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com