[issue46350] re.sub, re.Match.expand, etc doesn't allow x, u, U, or N escapes in the template

Dan Snider Tue, 11 Jan 2022 14:13:52 -0800

New submission from Dan Snider <mr.assume.a...@gmail.com>:

The docs use the phrase "unknown escapes of ASCII letters are reserved for 
future use and treated as errors". That seems ambiguous enough to question why 
"\x", "\u", "\U", and "\N{}" escapes aren't expanded in the template parameter 
like they are in patterns.


Since I didn't get a response to the security report I submitted a few weeks 
ago about \N{} escapes, I'm cautiously assuming it's safe to bring it up here 
that the "unicode-escape" encoding and re and probably everything else that 
uses it ignores two obvious clues that a name lookup will fail: length and the 
presence of invalid characters. I didn't look very hard for a  definite length 
cap in the spec, but 255 seems more than sufficient, based on longest name at 
present with its 82 characters. Even something as absurd as 65535 would be 
preferable to the current implementations, which will keep going to the end as 
in:

    >>> r"\N{%s}" % ("\ufb03"*2**30)

searching or a terminating "}" and still perform a lookup of the 2**30 
character name.

Another tangentially related "bug" (which probably deserves its own issue) is 
the inconsistency between group names and standard Python identifiers. The 
following example shows how the python compiler decomposes a ligature 'ﬃ' in 
source code to the ASCII string "ffi", while re merely checks if it could be 
converted to an identifier:

    >>> ﬃ = re.search("(?P<ﬃ>.)", "xxx")
    >>> ffi.groupdict()
    {'ﬃ': 'x'}
    >>> "\ufb03" in vars(), "\ufb03" in _
    (False, True)

----------
messages: 410337
nosy: bup
priority: normal
severity: normal
status: open
title: re.sub, re.Match.expand, etc doesn't allow x, u, U, or N escapes in the 
template
type: behavior

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46350>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46350] re.sub, re.Match.expand, etc doesn't allow x, u, U, or N escapes in the template

Reply via email to