> Portability is not a sufficient excuse though. There are bugs, like That's right, we haven't fixed things because we are lazy and stupid. How did you guess?
> that with double recoding, or with $ARGV[0] not being equivalent to > substr($ARGV[0], 0). What substr() example you are referring to here? I cannot find this in your recent messages. > The API is, I'm afraid, not good enough, even if we ignore the old mode > of manipulating data in its external encoding. Namely, it doesn't > distinguish specifying the encoding of the script source (which depends > on where it has been written) from specifying the encoding that the > script should assume on STDIN/STDOUT/STDERR and other places (which > depends on where it is being run). Well, other places when implemented, > assuming it will be indeed triggered by the 'encoding' pragma. You may consider the encoding pragma broken for your uses, and that is fine, but I have to point out that many people are happily using it. If your environment is such that your script is in encoding X and your utilities operate in encoding X, all is fine. It's when you mix encodings when things get murkier. Take for example the output of qx(): you may declare somehow that it is in UTF-8, but the moment some utility behaves differently and spits out Latin-1 or Latin-2 or SJIS, you are screwed. > I hope the -C flag is considered a temporary hack, to be eventually > replaced with somethings which supports other encodings and not only > UTF-8. Possibly. It was an explicit solution for much greater brokenness that resulting from assuming implicit UTF-8 from locales. > use encoding files => "ISO-8859-2"; > use encoding terminal => "UTF-8"; What do you mean by "terminal"? The STD* streams or /dev/tty? > use encoding filenames => "ISO-8859-1"; > use encoding env => "locale"; Something like that would be nice, yes. Someone needs to implement it, though, and that's the problem. > We should think how it interacts with locale-aware behavior of > functions. Without 'use locale' and other pragmas it's clear: Perl > consistently assumes that every text is ISO-8859-1. When something like Well, no. In that case Perl assumes that everything is in whatever 8-bit encoding the platform happens to be using, with the exception that /\w/ and so forth only implement the character set of ASCII (in effect, the raw underlying <ctype.h> API). > 'use encoding' is in effect, Perl still interprets the scalars in the > same way, but treats them differently when they interact with the world. > > But with 'use locale' it assumes that non-UTF-8 scalars are in the > current locale encoding, which is incompatible with the assumptions > taken when UTF-8 scalars and non-UTF-8 scalars are mixed. So it will > probably never work together. If 'use locale' includes some essential > features besides the treatment of texts, like date/time formatting, > it should be available by other means, without at the same time causing > ord(lc(chr(161))) to be equal to 177, which doesn't make sense if > character codes are interpreted according to Unicode. It implies > that when localized texts are taken from the system, they must be > decoded from the locale encoding. If you really do have a Grand Plan of how to integrate locales and Unicode happily, congratulations. -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen