Re: Experiments with classical Greek keyboard input

Rich Felker Mon, 06 Feb 2006 18:19:32 -0800

On Tue, Feb 07, 2006 at 01:35:40AM +0200, ????????? ???????????? wrote:
> >-- Many (maybe most) font creators made different glyphs for oxia
> >   and tonos (although others did not, see the Gentium font), because
> >   they were "looking at unicode". But, surely, that was the correct
> >   place to look?
> Well there is no other way for modern greek. Neither can be a distinction
> between tonos and oxia, nor we may have two different keycodes for
> the same character. Imagine what will happen if a Greek user uses
> polytonic keyboard to enter a filename.
> It's just a matter of fonts. If someone wants to write monotonic greek is
> free to use any font he/she likes. But for polytonic greek he/she has
> to use a polytonic font (which must define correctly the polytonic glyphs)
> Font designers claim the opposite; that the user should keep oxia and
> tonos combinations distinct, but this is incorrect according unicode
> and, as I said, is extremely dangerous when mixed with modern greek.
> 
> Then again, the actual reason is that unicode cannocinal equivalence is not
> correctly implemented neither by applications nor by fonts.


Another way of looking at it is that the Unicode people are stuck in
the world of Windows and word processors and can't see past it.
Clearly something like the filesystem that deals with (essentially,
aside from \0 and /) arbitrary binary byte sequences cannot be
expected to, and should not, make Unicode canonical equivalence
substitutions.

If you're actually worried about people using the 'bad' character
choices in filenames, a better solution would probably be to advise
people making fonts to have these characters represented by the
replacement character glyph, so that only the applications which
understand canonical equivalences would display anything reasonable at
all. That would be a good discouragement against their use. :)

[Here I'm talking about people making terminal fonts, gui interface
element fonts, etc., not fonts for wordprocessing/print use which we
don't really have much influence over.]

However, this issue does get much more hairy with other canonical
equivalence issues like combining/precombined forms, canonical
ordering of combining characters, etc. I don't know any way to address
it except asking users not to be stupid. Somehow I expect the ones who
will be _typing_ filenames will be savvy enough to stick to sane
filename choices, and the rest will just select files from a
Qt/GTK/whatever dialog box.

It's important to remember that this is really nothing new with
Unicode. It's always been possible to make nasty filenames that look
equivalent but which are not, for instance embedding terminal escape
sequences in filenames...

> According to unicode, a proccess must not treat equivalent characters
> differently, nor assume that some other proccess does.

This requirement is vague and inherently impossible to satisfy if you
use broad enough concepts of 'a process'. For example, is it illegal
for strlen to return different numbers on strings that have the same
canonical representation, but which are a different number of bytes?
:)

> Even more, a text may be automatically normalized at any time (without
> the user or any other program knowing that) by the system or a intermediate
> proccess, having some characters decomposed or replaced by their
> canonical equivalents.

Yes, lovely. A binary-clean text editor or hex editor that processes
the text as UTF-8 (or any other unicode encoding) can trash the binary
file at any time. Just lovely. Moreover, guidelines like this are
encouraging implementors of UTF-8 text editors to make broken
non-binary-clean implementations, and discouraging anyone who wants a
binary-clean system from considering UTF-8.

Gross design mistakes like this, and the Windows/16bit-centricness of
the Unicode spec, have me largely convinced that UCS (ISO-10646) is
the standard we should follow for basic character handling under *nix,
rather than Unicode, and that Unicode should just be used as a guide
for supplemental functionality (such as case folding, collation, etc.)
in applications that need such features.

> >I hope there is a way to put the genie back into the bottle. Just making
> >the keyboard entry for oxia "hard, forcing people not to use it" does
> >not seem to be the right way.
> The correct way is the maturity of unicode:
> When all the texts are beeing normalized, all programs will become aware
> of character equivalence, and smart fonts will be used to decide which
> glyph suits best for every case.

Normalization at display time to select a glyph image is a very good
idea. Normalization of the actual stored data is a horrible mistake.

> In the meantime, some font designers use this workaround to improve
> the displaying of their fonts, thus making the problem persistant

:(

Rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Experiments with classical Greek keyboard input

Reply via email to