[issue26843] tokenize does not include Other_ID_Start or Other_ID_Continue in identifier

Joshua Landau Sun, 24 Apr 2016 18:59:23 -0700

New submission from Joshua Landau:

This is effectively a continuation of https://bugs.python.org/issue9712.


The line in Lib/tokenize.py

    Name = r'\w+'

must be changed to a regular expression that accepts Other_ID_Start at the 
start and Other_ID_Continue elsewhere. Hence tokenize does not accept '℘·'.


See the reference here:

    https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers

I'm unsure whether unicode normalization (aka the `xid` properties) needs to be 
dealt with too.


Credit to toriningen from http://stackoverflow.com/a/29586366/1763356.

----------
components: Library (Lib)
messages: 264145
nosy: Joshua.Landau
priority: normal
severity: normal
status: open
title: tokenize does not include Other_ID_Start or Other_ID_Continue in 
identifier
type: behavior
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26843>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26843] tokenize does not include Other_ID_Start or Other_ID_Continue in identifier

Reply via email to