[issue40980] group names of bytes regexes are strings

Quentin Wenger Tue, 16 Jun 2020 06:52:16 -0700


Quentin Wenger <wenger.quen...@bluewin.ch> added the comment:


> It seems you don't know some knowledge of encoding yet.

I don't have to be ashamed of my knowledge of encoding. Yet you are right that 
I was missing a subtlety, which is that latin-1 is a strict subset of Unicode 
rather than a completely arbitrary encoding. Thank you for that.

So what you are saying is that group names in bytes regexes can only be 
specified directly (without -explicit- encoding), so de facto they are limited 
to the latin-1 subset.

Very well.

But then, once again:

1) why convert them to string when spitting them out? bytes they were when 
going in, bytes they should remain... **By converting them you are choosing an 
arbitrary encoding, even if it is the "natural" one.**
2) this limitation to the latin-1 subset is not compatible with the 
documentation, which says that valid Python identifiers are valid group names. 
If this was really the case, then I would expect to be able to use any string 
for which .isidentifier() is true as a group name, programmatically.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40980>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40980] group names of bytes regexes are strings

Reply via email to