From: Bram Moolenaar <[EMAIL PROTECTED]>
Andries Brouwer wrote:
> There is a potential big problem in this area. If the kernel doesn't
> do conversion, will all applications have to do this?
>
> No. A filename is just a sequence of bytes - no conversion required
> or desirable.
From the point of view of the kernel it's just a sequence of bytes ...
From the point of view of the user the bytes form characters with a
specific meaning. If you use the wrong character set, that meaning
is lost.
True. But the user uses the right locale, so there is no problem.
At least no problem that we can do anything meaningful about.
Conversion is required to keep the meaning.
No. Conversion is impossible.
Filenames are not only for the user, they are also parsed by shells
and operating systems. You will be unable to convert filenames
and not introduce bugs much worse than the legibility problem.
> Linux is a multi-user system. Different users with different nationalities
> use different locales. These Russians all want KOI-8, while the Danes
> want ISO 8859-1. Most filesystem types do not store the character set
> the filename is supposed to be in, and most users do not know enough
> to supply such information.
Well, it's about time we start this then.
No, because one single ext2 filesystem has both the files of this Dane
and of these Russians. All are happy today, but as soon as you write
somewhere that it contains filenames in KOI-8, the Dane will be very unhappy.
If we are going to introduce UTF-8 for file names (which is mostly a good
idea), there will be a conflict with ISO-8859 names currently used
(especially in Europe). If this problem isn't solved properly,
users will not convert to using UTF-8. That's why this problem
needs to be tackled and discussed in this list.
I maintain: (1) There is no problem. Or (2) In case you think that
there is, it cannot be solved. And (3) filenames are the least of
your worries. Yes I see filenames in interesting character sets.
(Example: old DOS distributions sometimes have a sequence of filenames
with mostly line drawing characters, so that a DIR command in that
directory will show you some boxed text. But only in CP437 or so.)
But the only really interesting part is the file contents.
As soon as people use UTF-8 for that, the filenames will follow.
If the encoding is known, conversion can be done when required.
Where this happens is to be decided. Although I wouldn't be
surprised if this was solved somewhere by someone already.
Isn't it done for CD-ROM filesystems already?
Various Microsoft filesystems encode the encoding used.
Andries
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/