On Tue, Jul 18, 2023 at 9:55 AM Chet Ramey <chet.ra...@case.edu> wrote: > Unicode normalization on macOS has always been a pain in the ass.
I can see that! > This is the basic assumption that drives all the decisions: character input > you get from the terminal is in NFC, and files from the file system (names > and usually contents) are in NFD. I guess the point of this patch was that this assumption does not always hold -- neither file system names nor terminal input have any guaranteed normalization. I'll just give one more example, sorry if this is repetitive. In Finder, create two directories: señor and señora. List the parent directory in a Terminal window: $ ls señor señora Type `echo ' and then copy-paste the first filename from `ls' output: $ echo señor Note that tab completion doesn't work. Append a `*' and run the command: $ echo señor* señor* And if you're writing a script you have the same issue: $ bash -O failglob -c 'set -- *; echo "$1"*' bash: line 1: no match: señor* I'm not sure if you're saying that this is correct behavior, or if it's not worth fixing (I'm fine with that), or if there's something wrong with my proposed fix. > > NB: while HFS+ stores NFD names, APFS preserves normalization, so we > > can get either NFC or NFD text back from readdir. > > Well, that doesn't help. But I haven't seen any NFC text coming back from > readdir on any of my macs. > > > Currently, Bash never actually converts to NFD. The fnx_tofs() > > function is there but it is never used. Instead, Bash converts > > filenames to NFC with fnx_fromfs() before comparing with either the > > glob pattern or the completion hint text (which is never converted). > > Correct. It's a one-way conversion, since you only have to convert one > of the two different forms, and the current implementation works on text > entered interactively (which is in NFC). When you're reading a script, you > don't have to perform any conversion at all; your NFD examples all work > fine when run from a script. > > > Since access is normalization-insensitive, we just need to normalize > to > > _some_ form, so going to NFC is fine, but if we're going to do that > > we should normalize both the filesystem name and the text being > > compared. > > The idea is that since the text entered interactively at the terminal is > already in NFC, the curent implementation converts only what it knows is > coming from the keyboard. Did you mean "coming from the filesystem"? There are two places where fnx_fromfs is called: bash_filename_rewrite_hook, and glob_vector. Both of these operate on directory entries, not on keyboard input. > > If there's a match, globs expand to the filenames (NFC or NFD) as > > returned by readdir(), and Readline completes with NFC-normalized > > versions of the names. I think this makes sense. > > Because NFC is what you get from terminal input. > > > What doesn't work quite right currently though is that glob patterns > > with NFD text never match anything, and completion prefixes with NFD > > text never expand to anything. > > When entered from the terminal. It goes back to the basic assumption: NFC > is what you get from the terminal, so you have to convert from the file > system normalization form when you're sure what you want to compare is > coming from the terminal.