On 01/13/2011 01:51 AM, Michel Fortin wrote:
On 2011-01-12 19:45:36 -0500, Michel Fortin <michel.for...@michelf.com>
said:

A funny exercise to make a fool of an algorithm working only with code
points would be to replace the word "fortune" in a text containing the
word "fortuné". If the last "é" is expressed as two code points, as
"e" followed by a combining acute accent (this: é), replacing
occurrences of "fortune" by "expose" would also replace "fortuné" with
"exposé" because the combining acute accent remains as the code point
following the word. Quite amusing, but it doesn't really make sense
that it works like that.

In the case of "é", we're lucky enough to also have a pre-combined
character to encode it as a single code point, so encountering "é"
written as two code points is quite rare. But not all combinations of
marks and characters can be represented as a single code point. The
correct thing to do is to treat "é" (single code point) and "é" ("e" +
combining acute accent) as equivalent.

Crap, I meant to send this as UTF-8 with combining characters in it, but
my news client converted everything to ISO-8859-1.

I'm not sure it'll work, but here's my second attempt at posting real
combining marks:

Single code point: é
e with combining mark: é
t with combining mark: t̂
t with two combining marks: t̂̃

Works :-) But your first post worked as well by me: for instance <<"é" ("e" + combining acute accent)>> was displayed "é" as a single accented letter. I guess maybe your email client did not convert into iso-8859-1 on sending, but on reading (mine is set for utf-8).

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to