Re: Reading/writing non-Unicode files with perl5.8?

2003-01-14 Thread Earl Hood
On January 14, 2003 at 10:58, Nicholas Clark wrote:

 RedHat 8 defaults to setting UTF8 locales.
 UTF8 locales cause perl5.8 to switch to Unicode mode, because perl assumes
 that you meant to set a UTF8 locale.

I commented in a different post that, IMHO, Perl 5.8 is incorrect in
automatically going into UTF8 mode if the environment locale is
UTF8 but no 'use locale' pragma exists in the script:
http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-01/msg3.html

And Jarkko submitted a problem ticket about it:
http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-01/msg4.html

The main problem is that it is inconsistent behavior compared to
other locales, it contradicts the perllocale manpage, and it quietly
causes script compatibility problems that may not be easy for script
authors to track.

 My personal opinion is that it was premature of RedHat to make RedHat 8.0
 *default* to using UTF8 locales, given the general state of UTF8 support
 in most programs running on Linux.

I agree, especially if it quietly does it.  If it gives an option
during installation, with a proper warning that programs may not
support UTF8 locales, then I'm okay with it.

--ewh
-- 
Earl Hood, [EMAIL PROTECTED]
Web: http://www.earlhood.com/
PGP Public Key: http://www.earlhood.com/gpgpubkey.txt



Re: CGI and UTF

2003-01-05 Thread Earl Hood
On January 5, 2003 at 05:42, Jarkko Hietaniemi wrote:

  This is Bad Juju (tm). It _guarantees_ script breakage (potentially
  silently!) for Unix people doing _anything_ but ASCII text manipulation.  
 
 I repeat: I don't think you can do more than ASCII by hanging tooth
 and nail to the everything is bytes credo.

This statement assumes someone is working with characters.  It is
common for many to use regexs and other operators (substr, index,
et. al.) on binary data directly.

 I repeat: all your filehandles are still 'binary' unless you either
 explicitly (binmode) or implicitly (locale) command them not be.
 If you try to push Unicode (data marked as UTF-8, such as characters
 beyond 255) on such a filehandle, you'll get 'Wide character' warning.
 If you do not like the locale implicit switching, reset your locale
 to something not /utf-?8/i in it before running the script.

I think this reasoning is flawed since it assumes the author of
the script has complete control over the environment.  For example,
the script can be used by others in environments the author does not
control.  Therefore, older programs can quietly break, or behave
different.

According the perllocale manpage, locale should have no effect
unless the 'use locale' pragma is specified.  It appears from
Benjamin's script that he is not using the pragma, so even if the
environment has a utf-8 locale, the script should be unaffected.

--ewh