Hi,
At Fri, 2 Feb 2001 15:59:51 +0100 (CET),
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:
> This is unfortunate for Haskell, probably Java, and other languages which
> use Unicode wide characters internally. Because when names are physically
> stored in UTF-8 (a sample ext2 installation in the future) or UCS-2
> (VFAT), but the locale is e.g. ISO-8859-x and thus the filesystem is
> mounted with conversion to ISO-8859-x, handling filenames in these
> encodings loses data because of the bad intermediate form.
Though I don't know about Haskell, Java automatically convert strings
from/to locale-encoding for every I/O. Thus, we don't need to be aware
of the internal Unicode. Tcl/Tk is based on the same design.
> An alternative design would allow 1. as an additional option, preferable
> using wchar_t instead of UTF-8.
wchar_t cannot be used, because it is incompatible with char.
(C++ can have multiple functions with the same name with different
prototype declaration. Thus you can have wchar_t version of fopen()
in C++. However, kernel and libc are written in C, not C++.)
> I think that we must live with the fact that kernel-side encodings are
> specified and implemented very differently from libc encodings. There are
> modules for particular encodings and mount options telling which encoding
> to use.
Really? Isn't it possible to use libc's iconv()? Then kernel has to
have a large conversion table...
> I think 4. is the most compatible with the current world.
Note that 2. and 4. means that kernel must know software's LC_CTYPE locale.
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/