"Kent Karlsson" wrote on 2002-02-23 13:33 UTC:
> Also of interest here may be that, IIRC, HFS+ and UFS (the Apple
> file systems) represent all file names in NFD (and for UFS: in UTF-8).
> NFD, not NFC.

Oops, I didn't know that. That's far more of a concern when files are
exchanged between Macs and Linux. In particular since MacOS is in it's
latest incarnation just running on top of Berkeley Unix, I expect the
Mac platform to be far more frequently integrated with Unix systems, via
NFS, tar, pkzip, etc.

Alternative solutions:

 a) Linux goes NFD.
 b) MacOS goes NFC.
 c) Normalization when transfering files between the two worlds.
 d) Both sides learn to work well with either form.

The reasons for Linux prefering NFC were

  - That's far closer to existing practice with ISO 8859, JIS, etc.
  - The W3C has said the NFC shall be what the Web uses

and are as far as I can see still valid. The Linux world will in the
long run have to learn how to use combining characters anyway, as some
scripts depend on them (Thai most notably), so the occasional NFD file
from a Mac shouldn't cause major disruption. GUI file selection will run
as before, independent of coding variants, and for the shell I can see
numerous tiny improvements to globbing and the TAB filename expansion
mechanism to make handling the NFC/NFD difference far more convenient.

It would be nice, if the MacOS world and the Linux world used the same
convention, but if not, I think it is a matter of user interface
maturity, how easy it will be to deal with the difference.

Example:

You have two files

  Müller
  Müllerin

in a directory, the first in NFD, the second in NFC. If you press M+TAB
in a yet to be written UTF-8 aware version of bash, it will fail to
expand to Müller, as the two strings differ after the first letter.
Typing Mu+TAB will expand one, and typing Mü+TAB will expand the other,
so there is a solution for experienced users.  A user interface
inprovement would be to provide two control keys that allows to scroll
through the list of files that are available in the current state of the
TAB selection. I could also imagine bash doing a normalization, such
that entering a prefix in one normalization will include the file name
in the other one as well. There are lots of ways to implement this in a
convenient way, and the only real problem is to get the bash maintainers
interested in UTF-8 at all ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to