[issue46410] TypeError when parsing regexp with unicode named character sequence escape

Jirka Marsik Mon, 17 Jan 2022 04:31:44 -0800


New submission from Jirka Marsik <jiri.mar...@oracle.com>:


re.compile(r"\N{name of Unicode Named Character Sequence}"), e.g. 
re.compile(r"\N{KEYCAP NUMBER SIGN}"), throws a TypeError. The regular 
expression parser relies on 'unicodedata' to lookup character names. The 
'unicodedata' module recently added support for Unicode Named Character 
Sequences (https://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt). 
Trying to use these named character sequences in a regular expression leads to 
a 'TypeError', as the regexp parser tries to call 'ord' on a string with length 
> 1.

----------
components: Regular Expressions
messages: 410770
nosy: ezio.melotti, jirkamarsik, mrabarnett
priority: normal
severity: normal
status: open
title: TypeError when parsing regexp with unicode named character sequence 
escape
type: behavior
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46410>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46410] TypeError when parsing regexp with unicode named character sequence escape

Reply via email to