On Fri May 19 19:45:43 CDT 2006, [EMAIL PROTECTED] wrote:
> There's no such thing as an accented letter in a Russian language.
> That was the exact point of my initial remark.
the text was /romanized/ russian names. it was not written in the cyrillic
alphabet.
>
> Now, if you allow me to educate myself in Unicode a little bit,
> I'm about to follow through with your example. Be patient with me ;-)
as long as you're patient with me.
>
> > suppose that U+x is the cp for the letter.
> > suppose U+y is the cp for the accent.
>
> Ok.
>
> > suppose that we're lucky and there exists U+w ? U+xU+y.
>
> Just to make sure I still follow: U+w is supposed to *visually*
> look like U+x followed by U+y, right ?
yes. they must be the same.
>
> > then U+w should be the same glyph as U+xU+y.
>
> The same glyph from a visual standpoint, right ?
a glyph IS the visual representation.
>
> > cannonical composition would yield
> > compose(U+xU+y) U+w
> > compose(U+w) U+w
> > while cannonical decompostion would yield
> > decompose(U+xU+y) U+xU+y
> > decompose(U+w) U+xU+y
>
> And that's exactly the place where I think Unicode goes against common
> sense and language rules. I would expect it to mandate that a *decomposable*
> character is supposed to be used over the decomposition. Which in your
> original example was the case.
rob agrees with you.
however, there is a big advantage to a composed character -- you don't have to
figure out
how to stick the horn, breve, slash, &c on top of, under, on the shoulder of,
through, &c
the original character. in plan 9, characters are bitmaps making this
operation extra
annoying. also, there are no rules in unicode preventing /arbitrary/
compositions.
this is valid unicode
u+0069 u+0300 u+0301 u+0302 u+0303
all those combining codepoints attach to the base cp u+0069. figure out how to
build that
glyph.
>
> "There are no accents in Russian language" (*)
>
now you're confusing language and alphabet! ☺
- erik