Ezio Melotti added the comment:
> _ID_FIRST_CATEGORIES = {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl",
> "Other_ID_Start"}
> _ID_CATEGORIES = _ID_FIRST_CATEGORIES | {"Mn", "Mc", "Nd", "Pc",
> "Other_ID_Continue"}
Note that "Other_ID_Start" and "Other_ID_Continue" are not categories -- they
are properties -- and that unicodedata.category() won't return them, so adding
them to these set won't have any effect. I don't think there's a way to check
if chars have that property, but as I said in my previous message it's probably
safe to ignore them (nothing will explode even in the unlikely case that those
chars are used, right?).
> def is_id_char(char):
> return char in _ASCII_ID_CHARS or (
> ord(char) >= 128 and
What's the reason for checking if the ord is >= 128?
> category(normalize(char)[0]) in _ID_CATEGORIES
> )
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21765>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com