Unfortunately, multibyte collation is simply unimplemented in MacOS X, so there is no alternate locale definition that will fix it. As far as I can tell this is documented only in the BUGS section of `man wcscoll`:
BUGS The current implementation of wcscoll() only works in single-byte LC_CTYPE locales, and falls back to using wcscmp() in locales with extended character sets. ( https://opensource.apple.com/source/Libc/Libc-1272.250.1/string/FreeBSD/wcscoll.3.auto.html ) Eric On Wed, Sep 25, 2019 at 8:59 AM Peng Yu <pengyu...@gmail.com> wrote: > I want to make my `sort` to be machine-independent and always use the > correct Unicode sort order. Is there a way to do so? > > I don't know how to check where en_US.UTF-8 comes from. Do you know > how to check it? (I use Mac OS X.) > > On 9/25/19, Eric Blake <ebl...@redhat.com> wrote: > > On 9/25/19 10:20 AM, Peng Yu wrote: > >> Hi, > >> > >> It seems that "café" should be sorted before "caff" in Unicode. > >> > >> https://github.com/jtauber/pyuca > >> > >> But `sort` does not do so. > >> > >> $ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort > >> cafe > >> caff > >> café > >> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort > >> cafe > >> caff > >> café > >> > >> How to make `sort` sort according to Unicode order? Thanks. > > > > You'll have to write a locale definition where strcoll() sorts in the > > order you want. Coreutils sort is calling strcoll(), and if it doesn't > > sort the way you think it should, the bug is in your locale and not in > > coreutils. You'll want to report this issue to whoever provided your > > en_US.UTF-8 locale (perhaps glibc?) > > > > -- > > Eric Blake, Principal Software Engineer > > Red Hat, Inc. +1-919-301-3226 > > Virtualization: qemu.org | libvirt.org > > > > > -- > Regards, > Peng > >