On Wed, Mar 16, 2005 at 10:23:01AM +0000, [EMAIL PROTECTED] wrote: > LANG is set to en_GB. > With some messing about I have managed to create an en_GB.utf8. > Setting LANG to that makes no difference to the perl output, as does setting > LC_ALL. > Mind you, I should hope it wouldn't as :raw ignores locale, apparently. > > In a nutshell, the code below should put \xc3\x84 into the output file and > not \xc4 as it is doing. Well, I presume it should and no one is saying > otherwise.
No, it shouldn't put the bytes \xc3\x84 into the file (Except on perl 5.8.0 with a UTF8 locale, or 5.8.1 or later run with the correct -C flag to say "pay attention to a UTF8 locale". 5.8.0's behaviour was documented, but found to be undesirable) > #!/usr/bin/perl -w > use Encode(_utf8_on); > my $data = "\xC3\x84"; > _utf8_on($data); > open FH, ">aa"; > print FH $data ; > print length($data); As is, except for the cases noted above, the file handle is assumed to be 8 bit, not UTF8. Perl 5 makes the assumption (arguably wrong, but we're stuck with it now) that 8 bit file handles would like ISO-8859-1, and writes out your characters as ISO-8859-1. If you do this #!/usr/bin/perl -w use Encode(_utf8_on); my $data = "\xC3\x84"; _utf8_on($data); open FH, ">aa"; binmode FH, ":utf8"; print FH $data ; print length($data); or this #!/usr/bin/perl -w use Encode(_utf8_on); my $data = "\xC3\x84"; _utf8_on($data); open FH, ">:utf8", "aa"; print FH $data ; print length($data); to tell perl that the file handle is expecting UTF8 rather than the default, then you get a 2 byte file output. Nicholas Clark