[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Andrew Barnert via Python-ideas Sun, 23 Feb 2020 13:58:52 -0800

On Feb 23, 2020, at 12:52, Steve Jorgensen <ste...@stevej.name> wrote:
> 
> The only change I am proposing is that the iterability for characters in a 
> string be moved from the string object itself to a view that is returned from 
> a `chars()` method of the string. Eventually, direct iteratability would be 
> deprecated and then removed.
> 
> I do not want indexing behavior to be moved, removed, or altered, and I am 
> not suggesting that it would/should be.


That would be very weird. Something that acts like a sequence in every 
way—indexing, slicing, Sequence methods like count, other methods that return 
indices, etc.—except that it isn’t Iterable doesn’t feel like Python. Python 
even lets you iterate over even “old-style semi-sequences”, things which define 
__getitem__ to work with a contiguous sequence starting from 0 until they raise 
IndexError.

I think if you want to move iteration to chars, you’d want to move sequence 
behavior there too.

Also, I think you’d still want the chars view to iterate a new char type rather 
than strs or chars views; otherwise you still have the infinite regress 
problem—it only shows up when you decide to explicitly recurse into str 
(iterate anything that’s iterable, and iterate chars() on anything that’s a 
str), but it’s just as bad as the current state when you do; there’s still no 
way to say “recursively iterate strings, but only down to characters, not 
infinitely”.

I’m not sure I like the idea in any variation for Python, but a few more points 
in favor of it:

The chars view would open the door for additional views on strings. See Swift, 
making you state explicitly whether you want to iterate UTF-8 code points 
(bytes), UTF-32 code points, or enhanced grapheme clusters, instead of just 
picking one and that’s what you get (and the other two require constructing 
some separate object that copies stuff from the string). After all, a string is 
an iterable of all of those things; the fact that it happens to be stored as an 
array of Latin bytes, UCS2 code units, or UTF-32 code points, with a cache of 
UTF-8 bytes, doesn’t force us to treat it as an iterable of UTF-32 code points; 
only legacy reasons do.

And having a strutf8view could mean that in many apps, all bytes objects are 
binary data rather than encoded text, which makes the bytes type more 
semantically meaningful in those apps.

It could also make bridging libraries to languages where strings aren’t 
iterable more reasonable. For example, IIRC, pyobjc NSString objects today have 
methods to iterate strs so they can ducktype as strings; if strings weren’t 
Iterable, they could be much closer to a trivial pure bridge to the ObjC type.

Finally, a bunch of unicodedata functions and so on that are only make sense on 
single characters have to take str today and raise a ValueError or something if 
passed multiple characters. (There are even some Unicode functions that only 
make sense on single EGCs, but I think Python doesn’t provide any of them.) 
Passing a char object, you’d know statically that it makes sense; passing a str 
object, you don’t.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RVYVAXSCSYZ32ZUFZVMYXSHHL7VVG7GJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Reply via email to