On Wed, Dec 04, 2002 at 03:17:24PM -0500, Maiorana, Jason wrote:
> >The terminal
> >should renormalize everything (including pastes) to NFC.
> 
> Then how will I paste in some wacky invalid filename into
> my terminal in order, to say, rm it? Like I was saying,
> paste's should not be normalized.

I already explained this at length: ls (and other tools) should escape
"wacky" filenames using \x, \u and \U.  This is nothing new; ls already
escapes things, so it's just an extension on existing functionality.

Even if you don't normalize, unless ls does some quoting work, you're
not going to be able to paste all strange filenames.  For example, as I
mentioned, combining characters at the start of a filename.

Also, it's very difficult for terminals to handle this consistently.  Is
an invalid UTF-8 string one column width?  One per byte?  There are
definitions (eg. Markus has a page on it), but it's difficult enough to
get width right without having to deal with this.  Also, it's more
difficult to have a terminal implementation that can remember invalid
sequences on-screen to be able to copy them later; and it'd need to be
handled in terminal layers, like Screen, and mbswidth() identically, or
it'd become desynced.

In practice, since this (precise displaying of invalid UTF-8 sequences)
is a relatively obscure issue, this will never happen, and the result
would be broken filenames causing screen desyncs and not easily being
referenced (eg. to rm).

> Normalization for D has some serious drawbacks: if you were to try
> to implement, say vietnamese using only composing characters,
> it would look horrible. The appearance, position, shape, and size
> of the combining accents depends on which letter they are being
> combined with, as well as which other diacritics are being combined
> with that same letter.

That's entirely a rendering implementation detail; it should be easy for
the terminal's font renderer to normalize internally in whatever way is
most appropriate.

What scripts do you think NFD would be more appropriate than NFC for?
NFC seems to be fairly (de-facto) standard in Unix.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to