Tal Einat added the comment:
Indeed, I seem to have been misinterpreting the grammar, despite taking care
and reading it several times. This strengthens my opinion that we should use
str.isidentifier() rather than attempt to correctly re-implement just the parts
that we need.
Attached is a patch which fixes HyperParser._eat_identifier(), to the extent of
my testing (tests included).
When non-ASCII characters are encountered, this patch uses Terry's suggestion
of checking for valid identifier characters using ('a' +
string_part).isidentifier(). It also employs his suggestion of how to avoid
executing this check at every index, by skipping 4 characters at a time.
However, even with this fix, HyperParser.get_expression() still fails with
non-ASCII Unicode strings. This is because it uses PyParse, which doesn't
support Unicode! For example, it apparently replaces all non-ASCII characters
with 'x'. I've added (in this patch) a few tests for this, which currently fail.
FWIW, PyParse includes a comment to this effect[1]:
<quote>
The parse functions have no idea what to do with Unicode, so
replace all Unicode characters with "x". This is "safe"
so long as the only characters germane to parsing the structure
of Python are 7-bit ASCII. It's *necessary* because Unicode
strings don't have a .translate() method that supports
deletechars.
</quote>
Properly resolving this issue will apparently require fixing PyParse to
properly support Unicode.
.. [1]:
http://hg.python.org/cpython/file/d25ae22cc992/Lib/idlelib/PyParse.py#l117
----------
keywords: +patch
Added file:
http://bugs.python.org/file35876/taleinat.20140706.IDLE_HyperParser_unicode_ids.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue21765>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com