Re: intermediate summary (Re: filename encoding)

Marcin 'Qrczak' Kowalczyk Sat, 03 Feb 2001 03:10:16 -0800
Fri, 02 Feb 2001 20:34:47 +0000, Markus Kuhn <[EMAIL PROTECTED]> pisze:

> Thanks for illustrating so nicely the exact unmanageable and needless
> mess that we want to avoid here. But may be you still haven't understood
> that you are talking to the World Mental Health Organization's global
> ISO2022/ISO8859/JIS/GB/KSC/KOI-8RCUVW/CPxxx eradication project here,
> which has very well understood what it wants to get away from, namely
> the above horror vision. Can we move this off linux-utf8 now please?

Sorry, real world is more complex than having a single UTF-8. Currently
almost all applications on my computer use ISO-8859-2 internally and
externally, without any conversion. Most are not aware that characters
can be encoded in different ways at all.

Starting to use UTF-8 means introducing *more* conversions at the
beginning. I can't say "from now I will be using UTF-8 only", because
most programs can't handle it, and because for example for mail &
news I have to read and write ISO-8859-2 now.

The only way to introduce UTF-8 into a working environment is to
design and implement a generic encoding-aware framework, making UTF-8
one of available encodings, together with encodings used before. Then
gradually move to UTF-8 where possible, using the framework to keep
everything in sync.

People can switch to UTF-8 locale only if multibyte locale encodings
work at all in applications they are interested in, and if explicit
use of other encodings is possible where needed (like mail & news).

Then at some new places we discover that only Unicode is used
internally. When such places are talking to each other, we can
switch to avoid the generic framework and talk in Unicode directly.
At most there will be a conversion between different UTFs. For example
a Haskell<->Java binding will of course not go through the default
locale encoding, and similarly if a database interface or protocol
specifies that texts are in UTF-8 and is accessed from Haskell or
Java. Finally at some places there will be no need to provide any
other interface than just hardwired Unicode.

We are not at this stage yet. I have troubles finding a Unicode
terminal, Unicode editor, Unicode font, Unicode text printer, and
even C library interface to convert between two important encodings:
UTF-whatever and the encoding which can be assumed for default in
byte streams (both encodings have portability problems). I mean,
there are troubles with everything.

It's not easy to switch to Unicode.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: intermediate summary (Re: filename encoding)

Reply via email to