New submission from Joshua Landau: This is effectively a continuation of https://bugs.python.org/issue9712.
The line in Lib/tokenize.py Name = r'\w+' must be changed to a regular expression that accepts Other_ID_Start at the start and Other_ID_Continue elsewhere. Hence tokenize does not accept '℘·'. See the reference here: https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers I'm unsure whether unicode normalization (aka the `xid` properties) needs to be dealt with too. Credit to toriningen from http://stackoverflow.com/a/29586366/1763356. ---------- components: Library (Lib) messages: 264145 nosy: Joshua.Landau priority: normal severity: normal status: open title: tokenize does not include Other_ID_Start or Other_ID_Continue in identifier type: behavior versions: Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26843> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com