Re: Unicode: endpoint of evolution of encodings?

Pablo Saratxaga Fri, 19 Nov 2004 03:55:05 -0800

Kaixo!

[I don't reply to anything, it would be too long]

On Fri, Nov 19, 2004 at 01:47:36AM +0100, Danilo Segan wrote:

> encodings).  I want to type "letters", and display it using any of
> the scripts simply by changing a font.  I'm native Serbian, and most
> native Serbian speakers tend to think of it as a display property (you

Do they?
Non-native names are written differently in cyrillic and in latin;
for example "Chirac" vs "ÐÐÑÐÐ", or do people actually write "Åak Åirak"
instead of "Jacques Chirac" when writting an article about French
president in Serbian in latin script? 

> Ok, read my "character" as "letter", if you use this definition of a
> character.  So yes, Unicode is a collection of script symbols, which
> you call characters, and I call glyphs :)

No, a a character is an abstract entity.
A glyph is not abstract, it is the visual representation of a character
(or group of characters) with given font, style, weight, properties.

A "letter" is a language-related abstract concept
A "character" is a script-related abstract concept
A "glyph" is related both to language and script and also to a given font.

>> You wrongly see latin and cyrillic variants of Serbian as simple
>> differences in shape of the same characters; that is not so,
>> you should instead look at it as two orthographic variants.
> 
> If characters are defined as script elements, then sure (after all,

That is what it is imho.

> is independent of the script).  I was clearly talking about characters
> as letters, or elements used to write down a language.
> 
> If, OTOH, characters are defined as "the smallest component of written
> language that has semantic value; refers to the abstract meaning
> and/or shape, rather than a specific shape" (from Unicode Glossary,
> cited above), then I'm not wrong at all: "Ð"/"a" both are smallest
> components of written Serbian that have same semantic value,

Yes, but unicode is not about Serbian only; so you cannot interpret
that definition with such a narrow view.

Also, not how the semantic value is not about "language", but
about "*written* language", that implies script, imho.

> refer to same abstract meaning, but not the same shape (ok, they're 
> coincidentally the same shapes as well; I could have used Ð/d
> instead).  I.e. they're the one and single character.

But if you widen your interpretation to include another single
language (eg Russian, or English, or whatever) it won't work anymore;
particularly "Ð" is *not* the same abstract meaning than "l" in English.
(while Serbian can be written in both latin and cyrillic, English cannot
be written in cyrillic, that will be considered as wrong by most
people).

> attainable through composing mechanisms).  So, Unicode is a glyph
> repository, no matter what tricks you try to pull out :)

I would accept it is a collection of graphemes (plus a few combining
and modification characters), but not a glyph collection.
But note also that several of the encoded characters are there for
compatibility, and should not be used (that is the case of the latin
digraphs "lj" etc, you should not use them, you should use "l" and "j"
separately.

Ah, and note that, in case of the digraphs, there is not any single
casing pair with cyrillic; proper casing of "lj" depends of context
(in cyrillic you only have a lowercase and an uppercase: Ñ/Ð, while in
latin you have *three* possibilities: lj/Lj/LJ;
in cyrillic you can have ÑÐÐÐÐ ÐÐÐÐÐ ÐÐÐÐÐ in latin it is ljanka 
Ljanka
LJANKA; there is no one-to-one matching, but two-to-three, so, you
cannot achieve your goal of "encoding of Serbian independent of cyrillic
or latin display" either, unless you encode three casing states for
each letter: lowercase, initial-uppercase, all-uppercase.

-- 
Ki Ãa vos vÃye bÃn,
Pablo Saratxaga

http://chanae.walon.org/pablo/          PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Catalan or Esperanto]
[min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]

pgpXiNOvgE0ok.pgp
Description: PGP signature

Re: Unicode: endpoint of evolution of encodings?

Reply via email to