W liście z pon, 16-08-2004, godz. 16:50 +0100, Nick Ing-Simmons napisał:
> So in my speech synthesis stuff I had: > > use encoding qw(iso-8859-15); > > And then it worked right even if I happened to run it in en_GB.utf8 > that day. Places where texts are exchanged are divided into three groups: 1. The encoding must be specified externally or guessed. 2. The protocol includes specifying the encoding of its data. 3. The encoding is fixed by the protocol. Examples: 1. Local filenames on Linux, @ARGV, local text file contents, script source, sockets by default, IRC. 2. Many Internet protocols: WWW, email, usenet. XML files. 3. Gtk2 API (UTF-8), WinNT filenames (UTF-16). There are also two models how a Perl script may operate, which should better not be mixed in one program: A. The old model: it tries to work on the original encoding of the data. Uses non-UTF-8 scalars exclusively if the encoding is a byte encoding other than UTF-8, uses UTF-8 scalars if it's UTF-8, some things break for multibyte encodings other than UTF-8 (e.g. regexps). B. The new model: it uses Unicode internally, which is physically represented by non-UTF-8 scalars if it happens to fit in ISO-8859-1 and by UTF-8 scalars otherwise. Making all combinations working is a lot of work. A1 was simple when it worked. B3 is also reasonably simple, and this is the model the world should generally aim at, or at least to B2. The problem with switching from A to B is that B1 is not as simple to handle as A1. Many sources are still like 1, so there is a tendency to use the A model, especially when programming in the C language. Newer languages often try to use the B model exclusively (Java, .NET), because B2 and B3 are better than A2 and A3. In order for Perl to be usable in model B, it should try harder to make B1 working, because many places where texts are exchanged are still of the kind of 1. How well B1 works means how widely Unicode is usable. Ideally places in group 1 should have their encoding settable individually or in related clusters, with a common default taken from the locale. The common default might also be settable because taking the encoding from the locale might not be reliable. Places in group 2 might be sometimes overridable individually, but the default should come from the protocol rather than from the locale, and being able to override is not as important as in group 1. It's probably enough to be able to see raw data without recoding as an alternative to applying the encoding from the protocol. Places in group 3 should not be overridable, they should just work. It's important that they get the correct encoding even if Perl is used according to model A, if this model is still considered fully supported. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/