However, when you recommend to an application author that his application
should consider all filenames as being UTF-8, this is not an improvement.
It is a no-op for the UTF-8 users but breaks the world of the EUC-JP and
KOI8-R users.


Perhaps that is too conservative.

Any effort spent supporting legacy encodings, or being prepared to perform
charset conversions on input seems wasteful to me. (even to support
alternative unicode encodings) Locales are still useful, but I think locales
should not specify encoding.

There are a lot of benefits to be gained in the form of simplicity and
iteroperability, when applications are free to assume that all text they
might encounter will be utf-8 encoded. Common protocols and file
formats shouldnt have to even specify what encoding text is in, imo.
by specifying they are allowing for the posibility that it might be
different, and that an application may have to deal with charset
conversion etc...

System wide messages, the login screen, the filesystem, gecos fields,
.plans, /etc/issue, /etc/motd, etc are examples where I think a common
enforces encoding would be beneficial.

The alternative, such as tagging metadata onto the filesystem layer,
individual inodes, idividual file metadata descriptors, etc, seems far uglier
in comparison. (imagine a file who's name is in one encoding, metadata in
a second, and content in yet a third :(

IDN URL's are another good example. Its clearly preferable to have a URL
be both canonical (byte for byte) as well as in readable (i.e. non-punycode) form. If a user
provides an idn URI to the system or another user in an unexpected encoding,
the resoure would be unresolvable.


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to