On Fri, Jun 10, 2022 at 12:48 PM Tobias Bussmann <t.bussm...@gmx.net> wrote:
> Perhaps I can shed some light on this matter:

Hi Tobias,

Oh, thanks for your answers.  Definitely a few bits of interesting
archeology I was not aware of.

> Apple's libc collations have always been a bit special in that concern, even 
> for the non-UTF8 ones. Rooted in ancient FreeBSD they "try to keep collating 
> table backward compatible with ASCII" thus upper and lower cases characters 
> are separated (There are exceptions like 'cs_CZ.ISO8859-2').

Wow.  I see that I can sort the English dictionary the way most people
expect by pretending it's Czech.  What a mess!

> With your smoke test "sort /usr/share/dict/words" on a modern macOS you won't 
> see a difference between "C" and "en_US.UTF-8" but with "( echo '5£'; echo 
> '£5' ) | LC_COLLATE=en_US.UTF-8 sort" you can produce a difference against "( 
> echo '5£'; echo '£5' ) | LC_COLLATE=C sort". Or test with "diff -q 
> <(LC_COLLATE=C sort /usr/share/dict/words) <(LC_COLLATE=es_ES.UTF-8 sort 
> /usr/share/dict/words)"

I see, so it does *something*, just not what anybody wants.


Reply via email to