On Mon, Mar 23, 2020 at 7:06 PM Alex Hall <alex.moj...@gmail.com> wrote:
>
> I think I'm missing something, why is case insensitivity a mess?
>

Because there are many characters that case fold in strange ways.
"ıIiİ".casefold() == 'ıiii̇' which means that lowercase dotless ı
doesn't casefold to the same thing that uppercase dotless I. Some
characters case fold to strings of different lengths, such as "ß"
which casefolds to "ss". I haven't even tried what happens with
combining characters vs combined characters. And Unicode case folding
is already a simplified version of reality; what actual humans expect
can be even more complicated, such as (I think) German case folding
rules being different for names and for book titles, and the way that
umlauted letters are case folded.

On the other hand, this might actually mean it's *better* to have a
dedicated case-insensitive-cut-prefix operation. It would be difficult
to define it in easy terms, but basically it should be such that the
returned string (if not identical to the original) is the longest
suffix to the original string such that, if the returned string were
appended to the prefix and the result case folded, it would be the
same as the original string case folded. But there could be other
definitions, just as complicated, and not necessarily more correct.

In any case, this can (and in my opinion should) be deferred for
later. Start with the simple one that doesn't care about all these
complexities, and then expand from there as the need is found.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BI7YMJPJJV6BTUXJVGORZIF4NZUIVPM3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to