Chris Angelico added the comment:

The question was raised that there might be a problem with (UTF-8) bytes vs 
characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the 
line of source and then gets its character length to use as the new offset. So 
I don't think this is a duplicate of 2382.

(Side point: There appears to be quite a bit of complexity inside the CPython 
parser to cope with the fact that it does everything in UTF-8 bytes rather than 
simply decoding to text and lexing that. I presume that's for the sake of 
efficiency - that it'd be too slow to work through PyUnicode everywhere?)

Am looking into the rest.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27582>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to