On Mar 23, 2020, at 12:40, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Tue, Mar 24, 2020 at 6:31 AM Andrew Barnert <abarn...@yahoo.com> wrote:
>> 
>>> On Mar 23, 2020, at 04:51, Chris Angelico <ros...@gmail.com> wrote:
>>> 
>>> Right, which is why for a proposal like this, it's best to start with
>>> the simple and straight-forward option of case sensitivity and precise
>>> matching. Removing a prefix of "a\u0301" will not remove a leading
>>> "\xe1" and vice versa (just as those two strings don't compare equal).
>> 
>> Agreed, but I think it’s not just “to start with”, but forever, or at least 
>> as long as Python strings are sequences of Unicode code points. If 
>> "Café".startswith("Cafe\u0301") is false, "Café".stripprefix("Cafe\u0301") 
>> had better not strip anything. And as long as "é" in "Cafe\u0301" and 
>> any(ch=="é" for ch in "Cafe\u0301" are false, startswith is correct.
>> 
>> By comparison, in Swift, "Café".hasPrefix("Cafe\u{0301}") is true, because 
>> "Cafe\u{0301}" is a sequence of four Unicode scalars, the fourth of which is 
>> 'é', as opposed to Python where it’s a sequence of five Unicode code points. 
>> And of course Swift also has a slew of methods to do things like localized 
>> vs. default case-insensitive equality, substring, etc. testing, none of 
>> which Python has, or should have, as long as its strings are made of code 
>> points rather than scalars (or EGCs or whatever).
> 
> Maybe this would be something for the locale or unicodedata module?

Maybe. But a complete suite of functions for treating strings as made of 
Unicode scalars or EGCs or whatever seems like a lot of design work, and I 
don’t know if there’s enough demand for anyone to be willing to do it. Swift is 
a different story for a lot of reasons (brand new language, iterator model that 
makes “non-randomly-accessible sequence” a sensible thing, corporate team, a 
much worse status quo ante where strings were sequences of UTF-16 code units, a 
need to interface natively with Cocoa and its NFKD decomposed strings, …). For 
Python, it seems like if nobody’s put anything (other than thin wrappers around 
ICU) on PyPI, probably nobody needs support in the stdlib.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EARQY6BLX4T74HZ22L54QGEOBYT2U5TT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to