Quentin Wenger <wenger.quen...@bluewin.ch> added the comment:
> It seems you don't know some knowledge of encoding yet. I don't have to be ashamed of my knowledge of encoding. Yet you are right that I was missing a subtlety, which is that latin-1 is a strict subset of Unicode rather than a completely arbitrary encoding. Thank you for that. So what you are saying is that group names in bytes regexes can only be specified directly (without -explicit- encoding), so de facto they are limited to the latin-1 subset. Very well. But then, once again: 1) why convert them to string when spitting them out? bytes they were when going in, bytes they should remain... **By converting them you are choosing an arbitrary encoding, even if it is the "natural" one.** 2) this limitation to the latin-1 subset is not compatible with the documentation, which says that valid Python identifiers are valid group names. If this was really the case, then I would expect to be able to use any string for which .isidentifier() is true as a group name, programmatically. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40980> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com