On Wed, Mar 16, 2005 at 10:23:01AM +0000, [EMAIL PROTECTED] wrote:

> LANG is set to en_GB.
> With some messing about I have managed to create an en_GB.utf8.
> Setting LANG to that makes no difference to the perl output, as does setting 
> LC_ALL.
> Mind you, I should hope it wouldn't as :raw ignores locale, apparently.
> 
> In a nutshell, the code below should put \xc3\x84 into the output file and
> not \xc4 as it is doing. Well, I presume it should and no one is saying 
> otherwise.

No, it shouldn't put the bytes \xc3\x84 into the file
(Except on perl 5.8.0 with a UTF8 locale, or 5.8.1 or later run with the
correct -C flag to say "pay attention to a UTF8 locale". 5.8.0's behaviour
was documented, but found to be undesirable)

> #!/usr/bin/perl -w
> use Encode(_utf8_on);
> my $data = "\xC3\x84";
> _utf8_on($data);
> open FH, ">aa";
> print FH $data ;
> print length($data);

As is, except for the cases noted above, the file handle is assumed to be
8 bit, not UTF8. Perl 5 makes the assumption (arguably wrong, but we're stuck
with it now) that 8 bit file handles would like ISO-8859-1, and writes out
your characters as ISO-8859-1.

If you do this

#!/usr/bin/perl -w 
use Encode(_utf8_on); 
my $data = "\xC3\x84"; 
_utf8_on($data); 
open FH, ">aa"; 
binmode FH, ":utf8";
print FH $data ; 
print length($data); 

or this

#!/usr/bin/perl -w 
use Encode(_utf8_on); 
my $data = "\xC3\x84"; 
_utf8_on($data); 
open FH, ">:utf8", "aa"; 
print FH $data ; 
print length($data); 

to tell perl that the file handle is expecting UTF8 rather than the default,
then you get a 2 byte file output.

Nicholas Clark

Reply via email to