On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:

I think basing those functions on code points should suffice.  I also
think as soon as strings are assigned or loaded from file, they should
be normalized. So two code points like the A and Umlaut code points
would become one.


How would one know which code points were originally decomposed and which weren't? Should it be impossible to save a file that demonstrates the different possible UTF encodings of e.g. ö, and should a loaded/saved file which contained both encodings really be automatically entirely composed or decomposed when saved again?

I know of no text editor that handles UTF which automatically changes the encoding of pre-existing characters when saving the documents. And I would never want to use a text editor which does that by default.

The .SaveToFile() methods could take an optional parameter to decide
if the normalized version of the string gets saved, or if it must be
split again - which I think Mac OS-X prefers.

It doesn't. All OS functions that return file/path names return decomposed (UTF-8)strings. They accept both composed and decomposed strings. Text files are text files and can have any encoding you want, with any combination of composed and decomposed characters.


Jonas_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to