On 21 Apr 2023, at 12:01, Ronald Klop <ronald-li...@klop.ws> wrote:
> Van: Poul-Henning Kamp <p...@phk.freebsd.dk>
> Datum: maandag, 17 april 2023 23:06
> Aan: curr...@freebsd.org
> Onderwerp: find(1): I18N gone wild ?
> This surprised me:
> 
>     # mkdir /tmp/P
>     # cd /tmp/P
>     # touch FOO
>     # touch bar
>     # env LANG=C.UTF-8 find . -name '[A-Z]*' -print
>     ./FOO
>     # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
>     ./FOO
>     ./bar
> 
> Really ?!
...
> My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents 
> remark.

Same here. However, I have read that with unicode, you should *never*
use [A-Z] or [0-9], but character classes instead. That seems to give
both files on macOS and Linux with [[:alpha:]]:

$ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print
./BAR
./foo

and only the lowercase file with [[:lower:]]:

$ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print
./foo

But on FreeBSD, these don't work at all:

$ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print
<nothing>

$ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print
<nothing>

This is an interesting rabbit hole... :)

-Dimitry

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to