Hi,

At Fri, 2 Feb 2001 15:59:51 +0100 (CET),
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:

> This is unfortunate for Haskell, probably Java, and other languages which
> use Unicode wide characters internally. Because when names are physically
> stored in UTF-8 (a sample ext2 installation in the future) or UCS-2
> (VFAT), but the locale is e.g. ISO-8859-x and thus the filesystem is
> mounted with conversion to ISO-8859-x, handling filenames in these
> encodings loses data because of the bad intermediate form.

Though I don't know about Haskell, Java automatically convert strings
from/to locale-encoding for every I/O.  Thus, we don't need to be aware
of the internal Unicode.  Tcl/Tk is based on the same design.


> An alternative design would allow 1. as an additional option, preferable
> using wchar_t instead of UTF-8. 

wchar_t cannot be used, because it is incompatible with char.
(C++ can have multiple functions with the same name with different
prototype declaration.  Thus you can have wchar_t version of fopen()
in C++.  However, kernel and libc are written in C, not C++.)

> I think that we must live with the fact that kernel-side encodings are
> specified and implemented very differently from libc encodings. There are
> modules for particular encodings and mount options telling which encoding
> to use.

Really?  Isn't it possible to use libc's iconv()?  Then kernel has to
have a large conversion table...

> I think 4. is the most compatible with the current world.

Note that 2. and 4. means that kernel must know software's LC_CTYPE locale.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to