On 2011-01-12 19:45:36 -0500, Michel Fortin <michel.for...@michelf.com> said:

A funny exercise to make a fool of an algorithm working only with code points would be to replace the word "fortune" in a text containing the word "fortuné". If the last "é" is expressed as two code points, as "e" followed by a combining acute accent (this: é), replacing occurrences of "fortune" by "expose" would also replace "fortuné" with "exposé" because the combining acute accent remains as the code point following the word. Quite amusing, but it doesn't really make sense that it works like that.

In the case of "é", we're lucky enough to also have a pre-combined character to encode it as a single code point, so encountering "é" written as two code points is quite rare. But not all combinations of marks and characters can be represented as a single code point. The correct thing to do is to treat "é" (single code point) and "é" ("e" + combining acute accent) as equivalent.

Crap, I meant to send this as UTF-8 with combining characters in it, but my news client converted everything to ISO-8859-1.

I'm not sure it'll work, but here's my second attempt at posting real combining marks:

        Single code point: é
        e with combining mark: é
        t with combining mark: t̂
        t with two combining marks: t̂̃

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to