[issue45692] IDLE: define word/id chars in one place.

Terry J. Reedy Tue, 02 Nov 2021 12:20:23 -0700


Terry J. Reedy <tjre...@udel.edu> added the comment:


There have been occasional discussions about IDLE not being properly unicode 
aware in some of its functions.  Discussions have foundered on these facts and 
no fix made.  

1. The direct replacement string, your 'identcontchars', seems too big. We have 
always assumed that O(n) linear scans would be too slow.
2. A frozen set should give O(1) lookup, like fast enough, but would be even 
bigger.
3. The string methods operate on and scan through multiple chars, whereas IDLE 
wants to test 1 char at a time.
4. Even if the O(n*n) behavior of multiple calls is acceptible, there is no 
function for unicode continuation chars.  s.idchars requires that the first 
character be a start char, which is to say, not a digit.  s.alnum is false for 
'_'.  (Otherwise, it would work.)

I would like to better this time.  Possible responses to the blockers:

1. Correct; reject.

2. Maybe adding an elephant is better than keeping multiple IDLE features 
disabled for non-ascii users.  How big?

>>> import sys
>>> fz = frozenset(c for c in map(chr, range(0x110000)) if ('a'+c).isidentifier)
>>> sys.getsizeof(fz)
33554648

Whoops, each 2 or 4 byte slice of the underlying array becomes 76 bytes + 8 
bytes * size of hash array.  Not practical either.

3. For at least some of the uses, the repeated calls may be fast enough.

4. We can synthesize s.isidcontinue with "c.isalnum() or c == '_'".   
"c.isidentifier() or c.isdigit()" would also work but should be slower.

Any other ideas?  I will look at the use cases next.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45692>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45692] IDLE: define word/id chars in one place.

Reply via email to