On Tue, Jul 18, 2023 at 9:55 AM Chet Ramey <chet.ra...@case.edu> wrote:
> Unicode normalization on macOS has always been a pain in the ass.

I can see that!

> This is the basic assumption that drives all the decisions: character input
> you get from the terminal is in NFC, and files from the file system (names
> and usually contents) are in NFD.

I guess the point of this patch was that this assumption does not
always hold -- neither file system names nor terminal input have any
guaranteed normalization.

I'll just give one more example, sorry if this is repetitive.

In Finder, create two directories: señor and señora. List the parent
directory in a Terminal window:
$ ls
señor  señora

Type `echo ' and then copy-paste the first filename from `ls' output:
$ echo señor

Note that tab completion doesn't work. Append a `*' and run the command:
$ echo señor*
señor*

And if you're writing a script you have the same issue:

$ bash -O failglob -c 'set -- *; echo "$1"*'
bash: line 1: no match: señor*

I'm not sure if you're saying that this is correct behavior, or if
it's not worth fixing (I'm fine with that), or if there's something
wrong with my proposed fix.

> > NB: while HFS+ stores NFD names, APFS preserves normalization, so we
> > can get either NFC or NFD text back from readdir.
>
> Well, that doesn't help. But I haven't seen any NFC text coming back from
> readdir on any of my macs.
>
> > Currently, Bash never actually converts to NFD.  The fnx_tofs()
> > function is there but it is never used.  Instead, Bash converts
> > filenames to NFC with fnx_fromfs() before comparing with either the
> > glob pattern or the completion hint text (which is never converted).
>
> Correct. It's a one-way conversion, since you only have to convert one
> of the two different forms, and the current implementation works on text
> entered interactively (which is in NFC). When you're reading a script, you
> don't have to perform any conversion at all; your NFD examples all work
> fine when run from a script.
>
> > Since access is normalization-insensitive, we just need to normalize > to 
> > _some_ form, so going to NFC is fine, but if we're going to do that
> > we should normalize both the filesystem name and the text being
> > compared.
>
> The idea is that since the text entered interactively at the terminal is
> already in NFC, the curent implementation converts only what it knows is
> coming from the keyboard.

Did you mean "coming from the filesystem"?  There are two places where
fnx_fromfs is called: bash_filename_rewrite_hook, and glob_vector.
Both of these operate on directory entries, not on keyboard input.

> > If there's a match, globs expand to the filenames (NFC or NFD) as
> > returned by readdir(), and Readline completes with NFC-normalized
> > versions of the names.  I think this makes sense.
>
> Because NFC is what you get from terminal input.
>
> > What doesn't work quite right currently though is that glob patterns
> > with NFD text never match anything, and completion prefixes with NFD
> > text never expand to anything.
>
> When entered from the terminal. It goes back to the basic assumption: NFC
> is what you get from the terminal, so you have to convert from the file
> system normalization form when you're sure what you want to compare is
> coming from the terminal.

Reply via email to