> I was thinking about this: maybe the NFS server could enforce
> normalization form 'C' so that only the precomposed variant:

Note that normalisation form C is NOT "no combining characters",
it is "maximally (re)composed according to Unicode 3.0".  Combining
characters can remain, and "new" precomposed characters would
be decomposed (so there is no point in allocating such
characters; though one such snuck in for Unicode 3.2).


...
> without duplicate filenames. Hangul would immediatelly be ok
> without the need of jamo decomposition. And we are also very


Funny you should mention Hangul here.  Hangul is the most glaring
example where Unicode normalisation does NOT "normalise away"
multiple representations of the SAME spelling.  E.g. <gg><a>
and <g><g><a> represent EXACTLY the same syllable, not even a hint
of a difference (like width, font fixedness, or anything else),
but none of the Unicode normalisation forms map them to the same
representation (NFD, NFKD: no change; NFC, NFKC: <gga> and <g><ga>).
This is due to historic events (note: there is NO syllable break
between the two <g>'s); but that does not make the non-decomposition
into the best way of handling Hangul.

Also of interest here may be that, IIRC, HFS+ and UFS (the Apple
file systems) represent all file names in NFD (and for UFS: in UTF-8).
NFD, not NFC.


                Kind regards
                /kent k

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to