Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Wed, 12 Jan 2011 16:55:50 -0800

On 2011-01-12 19:45:36 -0500, Michel Fortin <michel.for...@michelf.com> said:

A funny exercise to make a fool of an algorithm working only with codepoints would be to replace the word "fortune" in a text containing theword "fortuné". If the last "é" is expressed as two code points, as "e"followed by a combining acute accent (this: é), replacing occurrencesof "fortune" by "expose" would also replace "fortuné" with "exposé"because the combining acute accent remains as the code point followingthe word. Quite amusing, but it doesn't really make sense that it workslike that.
In the case of "é", we're lucky enough to also have a pre-combinedcharacter to encode it as a single code point, so encountering "é"written as two code points is quite rare. But not all combinations ofmarks and characters can be represented as a single code point. Thecorrect thing to do is to treat "é" (single code point) and "é" ("e" +combining acute accent) as equivalent.

Crap, I meant to send this as UTF-8 with combining characters in it,but my news client converted everything to ISO-8859-1.

I'm not sure it'll work, but here's my second attempt at posting realcombining marks:


        Single code point: é
        e with combining mark: é
        t with combining mark: t̂
        t with two combining marks: t̂̃

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to