Terry J. Reedy <[email protected]> added the comment:
There have been occasional discussions about IDLE not being properly unicode
aware in some of its functions. Discussions have foundered on these facts and
no fix made.
1. The direct replacement string, your 'identcontchars', seems too big. We have
always assumed that O(n) linear scans would be too slow.
2. A frozen set should give O(1) lookup, like fast enough, but would be even
bigger.
3. The string methods operate on and scan through multiple chars, whereas IDLE
wants to test 1 char at a time.
4. Even if the O(n*n) behavior of multiple calls is acceptible, there is no
function for unicode continuation chars. s.idchars requires that the first
character be a start char, which is to say, not a digit. s.alnum is false for
'_'. (Otherwise, it would work.)
I would like to better this time. Possible responses to the blockers:
1. Correct; reject.
2. Maybe adding an elephant is better than keeping multiple IDLE features
disabled for non-ascii users. How big?
>>> import sys
>>> fz = frozenset(c for c in map(chr, range(0x110000)) if ('a'+c).isidentifier)
>>> sys.getsizeof(fz)
33554648
Whoops, each 2 or 4 byte slice of the underlying array becomes 76 bytes + 8
bytes * size of hash array. Not practical either.
3. For at least some of the uses, the repeated calls may be fast enough.
4. We can synthesize s.isidcontinue with "c.isalnum() or c == '_'".
"c.isidentifier() or c.isdigit()" would also work but should be slower.
Any other ideas? I will look at the use cases next.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45692>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com