"Kent Karlsson" wrote on 2002-02-23 13:33 UTC: > Also of interest here may be that, IIRC, HFS+ and UFS (the Apple > file systems) represent all file names in NFD (and for UFS: in UTF-8). > NFD, not NFC.
Oops, I didn't know that. That's far more of a concern when files are exchanged between Macs and Linux. In particular since MacOS is in it's latest incarnation just running on top of Berkeley Unix, I expect the Mac platform to be far more frequently integrated with Unix systems, via NFS, tar, pkzip, etc. Alternative solutions: a) Linux goes NFD. b) MacOS goes NFC. c) Normalization when transfering files between the two worlds. d) Both sides learn to work well with either form. The reasons for Linux prefering NFC were - That's far closer to existing practice with ISO 8859, JIS, etc. - The W3C has said the NFC shall be what the Web uses and are as far as I can see still valid. The Linux world will in the long run have to learn how to use combining characters anyway, as some scripts depend on them (Thai most notably), so the occasional NFD file from a Mac shouldn't cause major disruption. GUI file selection will run as before, independent of coding variants, and for the shell I can see numerous tiny improvements to globbing and the TAB filename expansion mechanism to make handling the NFC/NFD difference far more convenient. It would be nice, if the MacOS world and the Linux world used the same convention, but if not, I think it is a matter of user interface maturity, how easy it will be to deal with the difference. Example: You have two files Müller Müllerin in a directory, the first in NFD, the second in NFC. If you press M+TAB in a yet to be written UTF-8 aware version of bash, it will fail to expand to Müller, as the two strings differ after the first letter. Typing Mu+TAB will expand one, and typing Mü+TAB will expand the other, so there is a solution for experienced users. A user interface inprovement would be to provide two control keys that allows to scroll through the list of files that are available in the current state of the TAB selection. I could also imagine bash doing a normalization, such that entering a prefix in one normalization will include the file name in the other one as well. There are lots of ways to implement this in a convenient way, and the only real problem is to get the bash maintainers interested in UTF-8 at all ... Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/