On Sun, 28 Dec 2003, Nick Ing-Simmons wrote: > Jungshik Shin <[EMAIL PROTECTED]> writes: > > > > Then, he should switch to en_GB.UTF-8. > > I probably will.
Good ! > >Besides, he implied that > >he still uses ISO-8859-1 for files whose names can be covered by > >ISO-8859-1, which is why I wrote about mixing up two encodings > >in a single file system _under_ his control. > > There is a tendancy for programs to assume that the locale's encoding > is used for the contents of the file. In the UK there are a LOT of files > which are not UTF-8 but iso8859-1 or iso8859-15. Sure, there are tons of text files in EUC-JP, GB2312, EUC-KR, ISO-8859-7, Windows-1251, ISO-8859-1, TIS-620, KOI8-R. Switching to a UTF-8 locale means converting them all to UTF-8 (which is one-time cost) as well as well as their names. I did almost two years ago and so have others. If you want to keep them in ISO-8859-1/15. Fine. That's your choice, but please don't blame programs (or their tendency) for making a fair-enough assumption when *** NO OTHER ExTERNAL information is available ***. Not all files are under your control? That's when 'additional external information' comes to the scene. Computers are stupid. You know that well. Often times, you have to help them instead of being helped by them. > assumptions are "mostly harmless". If I switch to a UTF-8 locale and > a stupid program dies because I spelt naive correctly in 8859-1 > and that is a UTF-8 coding violation I don't gain much. You're not supposed to do that if you're in UTF-8. Why would you want to use anything other than UTF-8 if you like Unicode/UTF-8 so much. > > Moreover, why would you think that en_GB.UTF-8 locale gives him the > >time and date format NOT suitable for him? You're making a mistake of > >binding locale and encoding. Encoding should never be a part of the > >locale definition. > > That is EXACTLY the point Jarkko and I are making. The locale setting > really tells you NOTHING about the encoding. So, what is nl_langinfo(CODESET) for? > So when presented with > > if (-d "\x{20ac}4") ... > > how is "locale" supposed to help poor Joe in his en_US.utf8 locale looking > at a sub-dir created by Kurt in [EMAIL PROTECTED] or was it Karl in de_DE.utf8 How could it? No way. It CANNOT. Have I ever said it could? Absolutely not. It's YOUR responsibility to take care of that mess that was created by you or your colleagues. You have to pay the price for mixing up multiple encodings (even if it's your friends/colleagues that made them, you're trying to access them so that you have to make it work by providing additional information. otherwise, programs cannot help resorting to the locale-based default). For what reason do you think I proposed a set of options you agreed that would work more or less? > > Before writing that, please read the man page of 'smbmount' and > >'mount' if Linux system is available to you. They're not environment > >variables. > I think you are on "our" side. Sure, I'm, but I'm afraid you learned 'too much lessons' from Perl 5.8. Jungshik